Can I upgrade Kubernetes without creating a new cluster?

Not recommended. In-place upgrades are higher risk and harder to roll back. Codiac's blue/green cluster approach creates a fresh cluster and migrates workloads, which is safer and provides instant rollback by switching traffic back to the old cluster.

How long can I run both clusters in parallel during an upgrade?

As long as needed. Some teams run parallel clusters for 72 hours to ensure stability. Most complete the migration within 4 to 24 hours.

Do I need downtime for DNS cutover during a cluster upgrade?

No. DNS propagation means both clusters serve traffic during the transition period, which typically takes 60 to 300 seconds.

Can I upgrade multiple Kubernetes environments at once?

Yes, but it is recommended to upgrade dev and staging environments first to validate the process before upgrading production.

Does Codiac support in-place node pool upgrades?

Codiac recommends blue/green cluster migration for production upgrades. For in-place node pool upgrades, use your cloud provider's native tools such as eksctl, az aks, or gcloud, and Codiac will continue managing your workloads on the updated nodes.

Zero-Downtime Cluster Upgrades

Q: What if the new cluster has issues after cutover?

You can instantly roll back by promoting the old cluster again. The old cluster remains running until you explicitly delete it, providing a safety net throughout the migration.

Upgrade Kubernetes versions, node pools, and cluster infrastructure without downtime or risk. Codiac's blue/green cluster approach eliminates the anxiety of in-place upgrades by creating a fresh cluster and migrating workloads seamlessly.

Interactive CLI

The Codiac CLI guides you through each command interactively. You don't need to memorize flags, just run the command and the CLI prompts you for what it needs.

The Problem: Upgrade Anxiety

Traditional Kubernetes cluster upgrades are high-risk operations:

In-place version upgrades risk breaking production workloads
Node pool rolling updates cause cascading pod evictions
Manual coordination required across teams for upgrade windows
Rollback complexity when upgrades fail mid-process
Testing challenges - can't validate new version without affecting production
Multi-hour maintenance windows required for large clusters

Statistics:

Gartner forecasts 75% of orgs will run containerized apps by 2026
Average cluster upgrade takes 2-4 hours (AWS EKS best practices)
Most teams delay upgrades due to risk, falling behind on security patches

How Codiac Solves It: Blue/Green Cluster Migration

Instead of in-place upgrades, Codiac uses a blue/green cluster strategy:

Create new cluster with desired Kubernetes version
Deploy workloads to new cluster (using existing snapshots)
Validate new cluster with production traffic
Switch traffic from old cluster to new cluster
Decommission old cluster when confident

Key advantage: Old cluster remains untouched until you're confident in the new one. Instant rollback if issues arise.

Quick Start: Blue/Green Cluster Upgrade

The workflow:

Provision new cluster with your desired Kubernetes version (via Terraform, eksctl, or cloud console)
Capture it into Codiac:
```
codiac cluster capture
```
The CLI prompts you for provider, region, and cluster name.
Deploy your existing workloads from snapshot:
```
codiac snapshot deploy
```
The CLI prompts you to select a snapshot version and target cabinet.
Validate the new cluster (run tests, check health)
Switch DNS to point to new cluster's ingress
Destroy old cluster when confident:
```
codiac cluster destroy
```

Downtime: Zero. Old cluster serves traffic until DNS cutover completes.

Rollback: Point DNS back to old cluster, it's still running.

Detailed Workflow

Phase 1: Provision New Cluster

Create a fresh cluster with your desired Kubernetes version using your preferred method:

Terraform/Pulumi: Use your existing IaC to provision the new cluster
Cloud console: Create via AWS/Azure/GCP console
eksctl/az aks/gcloud: Use cloud-specific CLI tools
Codiac: Run codiac cluster create for guided provisioning

Then capture it into Codiac:

codiac cluster capture

The CLI prompts you for:

Cloud provider (AWS, Azure, GCP)
Account/subscription ID
Region
Cluster name

Scripted mode (for automation)

# AWS
codiac cluster capture -n prod-2024-q4 -p aws -s 123456789012 -l us-east-1

# Azure
codiac cluster capture -n prod-2024-q4 -p azure -s your-subscription-id -l eastus -g your-resource-group

Time: 10-15 minutes (cloud provider dependent)

Phase 2: Migrate Workloads

Deploy your existing workloads to the new cluster using snapshots:

codiac snapshot deploy

The CLI prompts you to:

Select a snapshot version (Codiac automatically creates snapshots on each deployment)
Choose the target cabinet on the new cluster

First, see available snapshots:

codiac snapshot list

What gets deployed:

✅ Application deployments (same image versions)
✅ Configuration (environment variables)
✅ Secrets (re-provisioned from your secret store)
✅ Asset configurations and settings

What requires separate handling:

❌ Persistent volumes (requires backup/restore or database migration)
❌ Pod-specific state (ephemeral by nature)

Time: 2-5 minutes per cabinet

Phase 3: Validate New Cluster

Verify workloads are running:

# View cabinet status in the new cluster
codiac cabinet list

The web UI at app.codiac.io provides a dashboard view of all pods, their health status, and logs.

Validation checklist:

All pods running and healthy
Ingress returning 200 OK responses
Certificates valid and trusted
Database connectivity working
External API integrations responding
Logging and monitoring operational
Performance metrics within acceptable range

Testing strategies:

Parallel testing: New cluster receives copy of production traffic (via traffic mirroring at ingress level)
Canary testing: Route small % of real users to new cluster via DNS or load balancer
Synthetic testing: Run your existing test suites against new cluster endpoints

Phase 4: Traffic Cutover

Switch DNS to the new cluster:

Get your new cluster's ingress IP from the Codiac web UI or your cloud provider console, then update your DNS records to point to the new cluster.

What happens:

Update your DNS records to point to the new cluster's ingress IP
DNS TTL honored (typically 60-300 seconds for propagation)
Old cluster continues serving requests until DNS propagates
Once all DNS resolvers updated, traffic flows to new cluster
Old cluster remains running but idle (safe to decommission after validation)

DNS cutover process:

# Get new cluster info
codiac cluster list

# Update your DNS provider with the new ingress IP:
# A record: *.mycompany.com → [new-cluster-ingress-ip]

Rollback if issues detected:

Simply point your DNS back to the old cluster's ingress IP - it's still running and serving traffic.

Phase 5: Decommission Old Cluster

Wait period: 24-72 hours (ensure new cluster stable)

Monitor your new cluster using your existing observability tools (Datadog, Prometheus, etc.) to verify it's handling all traffic correctly.

When confident, destroy the old cluster:

codiac cluster destroy

The CLI prompts you to select which cluster to destroy.

Scripted mode (for automation)

codiac cluster destroy -n prod-old

Cost savings: Eliminates old cluster infrastructure costs immediately.

Comparison: Codiac vs Traditional Upgrades

Aspect	Traditional In-Place Upgrade	Codiac Blue/Green Migration
Downtime	5-30 minutes (node draining, rolling restarts)	Zero (traffic cutover instant)
Risk	High (can't rollback mid-upgrade)	Low (old cluster untouched)
Rollback	Complex (restore etcd backup, re-apply manifests)	Instant (revert DNS)
Testing	Limited (can't test without affecting prod)	Full (new cluster isolated)
Upgrade time	2-4 hours (manual steps, coordination)	30 minutes (automated)
Team coordination	Required (maintenance window)	Optional (anytime cutover)
Node pool changes	Risky (cascading evictions)	Safe (fresh nodes)
Configuration migration	Manually re-apply all ConfigMaps, Secrets, RBAC, network policies per cluster	Use `cod snapshot deploy` - entire configuration copied automatically
Documentation updates	Update runbooks, cluster-specific docs, team wikis for new version	Zero - Codiac snapshots are self-documenting, no cluster-specific docs needed

Migration Without Reconfiguration

Traditional upgrade: 30 minutes per service × 40 services = 20 hours re-applying configurations, troubleshooting version compatibility, updating documentation.

Codiac snapshot migration: cod snapshot deploy --version latest --cabinet new-cluster = 2 minutes. Complete configuration state replicated automatically.

Because Codiac captures your entire configuration in snapshots, you never reconfigure the same thing twice. Upgrade 5 clusters or 50 clusters - same effort. No GitOps drift, no forgotten ConfigMaps, no wiki pages to update.

Common Upgrade Scenarios

Scenario 1: Kubernetes Version Upgrade

Goal: Upgrade from K8s 1.29 to 1.31 (skip 1.30)

Steps:

Provision new cluster with K8s 1.31 using your IaC (Terraform, eksctl, etc.)
Capture the new cluster:
```
codiac cluster capture
```
Deploy workloads from snapshot:
```
codiac snapshot deploy
```
Select your latest snapshot and the target cabinet on the new cluster.
Validate - Run your test suites against the new cluster
Cutover - Update DNS to point to the new cluster
Destroy old cluster when confident:
```
codiac cluster destroy
```

Time: Under an hour, zero downtime

Scenario 2: Change Node Instance Types

Goal: Upgrade from m5.large to m5.xlarge for better performance

Steps:

Provision new cluster with m5.xlarge nodes using your IaC
Capture the new cluster:
```
codiac cluster capture
```

Create cabinets and deploy from snapshot:

codiac cabinet create
codiac snapshot deploy

Validate performance on the larger nodes
Cutover - Update DNS to point to the new cluster
Destroy old cluster when confident

Scenario 3: Multi-Region Cluster Creation

Goal: Create new cluster in EU region to reduce latency for European users

Steps:

Provision EU cluster using your IaC or cloud console
Capture the EU cluster:
```
codiac cluster capture
```
Select your EU region when prompted.

Create cabinets and deploy from snapshot:

codiac cabinet create
codiac snapshot deploy

Configure geo-routing at your DNS provider or load balancer to route European users to the EU cluster

This isn't an upgrade but demonstrates the cluster replication pattern.

Advanced: Gradual Migration (Canary Upgrade)

Migrate cabinets one at a time instead of all-at-once:

Steps:

Provision new cluster with your target Kubernetes version
Capture the new cluster:
```
codiac cluster capture
```
Create a low-risk cabinet first:
```
codiac cabinet create
```
Create the cabinet on the new cluster.
Deploy a single workload from snapshot:
```
codiac snapshot deploy
```
Deploy only the low-risk workload to test.
Update DNS for just this workload to point to the new cluster
Monitor for 24 hours, then migrate the next workload
Repeat until all cabinets are migrated

Advantage: Minimize blast radius if compatibility issues arise.

Handling Persistent Volumes

Challenge: Persistent volumes (PVs) are tied to specific clusters/nodes.

Migration strategies:

Option 1: Backup and Restore (Recommended)

Use your cloud provider's snapshot tools to backup and restore persistent volumes:

Backup data before migration using cloud-native tools (EBS snapshots, Azure Disk snapshots, etc.)
Provision new cluster and capture it into Codiac
Create cabinet and deploy workloads from snapshot
Restore data to the new cluster's volumes

Option 2: Managed Database Migration

For production databases, use managed services:

AWS: RDS snapshot → restore to new RDS instance → update connection string
Azure: Azure SQL backup → point-in-time restore
GCP: Cloud SQL automated backups

Update configuration for the new cluster:

codiac config set

The CLI prompts you to set the new database connection string for the appropriate cabinet.

Option 3: Live Replication (Zero Downtime for Databases)

For stateful apps requiring continuous availability:

Set up database replication from old cluster to new cluster
Let replication catch up (lag < 1 second)
Cutover application to new cluster
Promote new database replica to primary

Codiac doesn't automate database replication, but integrates with your existing replication tools.

Troubleshooting

Pods Not Starting in New Cluster

Symptoms: Pods stuck in Pending or CrashLoopBackOff state after migration.

Common causes:

Node resources insufficient - New cluster nodes smaller than old cluster
Image pull errors - Registry authentication not configured
Configuration missing - Secrets/ConfigMaps not migrated
API version incompatibility - Deprecated APIs in new K8s version

Debug:

Use the Codiac web UI at app.codiac.io to view pod status, logs, and events. You can also use kubectl directly:

kubectl get pods -n your-namespace
kubectl describe pod <pod-name> -n your-namespace
kubectl get nodes

Certificate Issues After Migration

Symptoms: HTTPS not working on new cluster, cert-manager errors.

Cause: TLS certificates tied to old cluster's ingress controller.

Solution: DNS must point to the new cluster for Let's Encrypt HTTP-01 challenge to succeed. Once DNS is updated:

Cert-manager will automatically request new certificates

Monitor certificate status in the Codiac web UI or via kubectl:

kubectl get certificates -n your-namespace
kubectl describe certificate <cert-name> -n your-namespace

If certificates aren't issuing, verify your ingress controller is properly configured and DNS is resolving to the new cluster.

DNS Propagation Delays

Symptoms: Some users still hitting old cluster hours after cutover.

Cause: DNS caching by ISPs or client resolvers.

Solutions:

Lower DNS TTL before upgrade (set to 60 seconds, 24 hours in advance)
Wait for propagation (typically complete within 5 minutes for properly configured DNS)
Parallel operation (run both clusters simultaneously for 24 hours)

Cost Optimization

Cluster upgrades cost money (two clusters running simultaneously).

Minimize costs:

Compress timeline - Complete migration within hours, not days
Downsize old cluster - Scale down old cluster nodes immediately after cutover
Off-hours upgrades - Perform during low-traffic periods
Spot instances for new cluster - Use spot/preemptible nodes during testing phase

Typical cost:

Running both clusters for 4 hours = ~16% of monthly cluster cost
Far cheaper than downtime or failed upgrade recovery

Multi-Cluster Management - Managing multiple clusters simultaneously
System Versioning - Snapshot-based deployments for migrations
Certificate Management - TLS certificate handling during upgrades
Dynamic Configuration - Configuration updates across clusters
CLI: Cluster Management - Command reference

FAQ

Q: Can I upgrade without creating a new cluster?

A: Not recommended. In-place upgrades are higher risk and harder to roll back. Codiac's blue/green approach is safer.

Q: How long can I run both clusters in parallel?

A: As long as needed. Some teams run parallel for 72 hours to ensure stability. Most complete within 4-24 hours.

Q: Do I need downtime for DNS cutover?

A: No. DNS propagation means both clusters serve traffic during transition (60-300 seconds typically).

Q: Can I upgrade multiple environments at once?

A: Yes, but upgrade dev/staging first to validate process before upgrading production.

Q: What if the new cluster has issues after cutover?

A: Instantly roll back by promoting the old cluster again. Old cluster remains running until you delete it.

Q: Do you support in-place node pool upgrades?

A: Codiac recommends blue/green cluster migration for production upgrades. For in-place node pool upgrades, use your cloud provider's native tools (eksctl, az aks, gcloud) and Codiac will continue managing your workloads.

Start your first zero-downtime upgrade: Create a test cluster

The Problem: Upgrade Anxiety​

How Codiac Solves It: Blue/Green Cluster Migration​

Quick Start: Blue/Green Cluster Upgrade​

Detailed Workflow​

Phase 1: Provision New Cluster​

Phase 2: Migrate Workloads​

Phase 3: Validate New Cluster​

Phase 4: Traffic Cutover​

Phase 5: Decommission Old Cluster​

Comparison: Codiac vs Traditional Upgrades​

Common Upgrade Scenarios​

Scenario 1: Kubernetes Version Upgrade​

Scenario 2: Change Node Instance Types​

Scenario 3: Multi-Region Cluster Creation​

Advanced: Gradual Migration (Canary Upgrade)​

Handling Persistent Volumes​

Option 1: Backup and Restore (Recommended)​

Option 2: Managed Database Migration​

Option 3: Live Replication (Zero Downtime for Databases)​

Troubleshooting​

Pods Not Starting in New Cluster​

Certificate Issues After Migration​

DNS Propagation Delays​

Cost Optimization​

Related Documentation​

FAQ​

The Problem: Upgrade Anxiety

How Codiac Solves It: Blue/Green Cluster Migration

Quick Start: Blue/Green Cluster Upgrade

Detailed Workflow

Phase 1: Provision New Cluster

Phase 2: Migrate Workloads

Phase 3: Validate New Cluster

Phase 4: Traffic Cutover

Phase 5: Decommission Old Cluster

Comparison: Codiac vs Traditional Upgrades

Common Upgrade Scenarios

Scenario 1: Kubernetes Version Upgrade

Scenario 2: Change Node Instance Types

Scenario 3: Multi-Region Cluster Creation

Advanced: Gradual Migration (Canary Upgrade)

Handling Persistent Volumes

Option 1: Backup and Restore (Recommended)

Option 2: Managed Database Migration

Option 3: Live Replication (Zero Downtime for Databases)

Troubleshooting

Pods Not Starting in New Cluster

Certificate Issues After Migration

DNS Propagation Delays

Cost Optimization

Related Documentation

FAQ