Kubernetes Autoscaling
Automatically scale your application workloads based on demand. Codiac integrates with Kubernetes Horizontal Pod Autoscaler (HPA) to dynamically adjust replica counts based on CPU, memory, or custom metrics.
Autoscaling is configured through the Codiac web UI at app.codiac.io. Navigate to your asset and configure scaling parameters in the visual interface.
What Is Autoscaling?
Autoscaling automatically adjusts the number of running pods (replicas) for an asset based on observed metrics. When load increases, Kubernetes adds more replicas. When load decreases, it scales back down.
Business Value:
- Cost optimization: Scale down during low traffic, reduce cloud spending by 40-60%
- Performance: Scale up automatically during demand spikes, maintain responsiveness
- Zero manual intervention: No need to manually adjust replica counts
- Resilience: Handle unexpected traffic bursts without capacity planning
How Autoscaling Works
Low Traffic (10% CPU) High Traffic (80% CPU)
┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐
│ Pod 1 │ Scales Up → │ Pod 1 │ │ Pod 2 │ │ Pod 3 │
└─────────┘ └─────────┘ └─────────┘ └─────────┘
1 replica 3 replicas
Load Decreases Load Increases Again
↓ ↑
Scales Down ← [Cooldown Period] → Scales Up
Key Concepts:
- Min Replicas: Minimum number of pods (never scale below this)
- Max Replicas: Maximum number of pods (never scale above this)
- Target Metric: CPU/memory utilization percentage that triggers scaling
- Cooldown Period: Wait time between scaling actions to prevent thrashing
Supported Metrics
| Metric Type | What It Measures | When to Use |
|---|---|---|
| CPU | Average CPU utilization across all pods | Most common, works for CPU-bound apps (APIs, web servers) |
| Memory | Average memory usage across all pods | Memory-intensive apps (caching, data processing) |
| Custom Metrics | Application-specific metrics (requests/sec, queue depth) | Advanced scenarios requiring domain-specific scaling |
- Web UI
- kubectl
Configuring Autoscaling in Web UI
Manage autoscaling through Codiac's visual interface with real-time scaling metrics.
Step 1: Navigate to Asset
- Open Codiac web UI at https://app.codiac.io
- Select your Enterprise
- Select the Environment
- Navigate to the Cabinet
- Click on your Asset
Step 2: Configure Autoscaler
- Click Scaling or Configure tab
- Toggle Enable Autoscaling to ON
- Configure scaling parameters:
Min Replicas:
- Minimum number of pods to maintain
- Recommended: At least 2 for high availability
- Never scales below this number
Max Replicas:
- Maximum number of pods allowed
- Set based on expected peak load
- Prevents runaway scaling costs
Target CPU Utilization (%):
- CPU percentage that triggers scaling
- Recommended: 60-80% for APIs, 80-90% for batch jobs
- Leave blank to disable CPU-based scaling
Target Memory Utilization (%):
- Memory percentage that triggers scaling
- Recommended: 70-85%
- Leave blank to disable memory-based scaling
Step 3: Save and Deploy
- Click Save to persist autoscaler configuration
- Click Deploy to apply changes
- Monitor deployment progress
Expected Outcome:
- HPA created in Kubernetes
- Asset scales based on configured metrics
- Real-time scaling visible in UI dashboard
Example Configurations
High-Traffic API
| Setting | Value | Rationale |
|---|---|---|
| Min Replicas | 3 | High availability across 3 AZs |
| Max Replicas | 50 | Handle 10x normal traffic |
| CPU Target | 60% | Keep latency low |
Background Worker
| Setting | Value | Rationale |
|---|---|---|
| Min Replicas | 1 | Cost optimization |
| Max Replicas | 10 | Limit concurrent jobs |
| CPU Target | 85% | Maximize resource usage |
Caching Layer
| Setting | Value | Rationale |
|---|---|---|
| Min Replicas | 2 | High availability |
| Max Replicas | 10 | Memory-bound |
| Memory Target | 75% | Scale on memory pressure |
Monitoring Autoscaler Activity
Scaling Metrics Dashboard
- Navigate to asset in web UI
- View Scaling Metrics section
- See real-time data:
- Current replica count
- Current CPU/memory utilization
- Target thresholds
- Scaling history (last 24 hours)
Metrics Displayed:
- Current Replicas: Number of running pods
- CPU Usage: Average across all pods (e.g., "68% / 70% target")
- Memory Usage: Average across all pods
- Last Scaled: Timestamp and action (scaled up/down, from X to Y replicas)
Scaling Event History
View historical scaling events:
- Click History or Events tab
- Filter by "Scaling Events"
- See timeline of scale-up/scale-down actions
Example Events:
14:32 - Scaled up from 3 to 5 replicas (CPU: 82%)15:45 - Scaled down from 5 to 4 replicas (CPU: 58%)
Editing Autoscaler Configuration
- Navigate to asset Scaling tab
- Modify min/max replicas or target percentages
- Click Save and Deploy
- Changes take effect within 1-2 minutes
Disabling Autoscaling
- Navigate to asset Scaling tab
- Toggle Enable Autoscaling to OFF
- Set Fixed Replicas count
- Click Save and Deploy
Result:
- Autoscaler removed
- Asset runs with fixed replica count
- No automatic scaling
Viewing Autoscaler Status with kubectl
While autoscaling is configured through the Codiac web UI, you can monitor autoscaler status directly with kubectl.
View HPA Status
kubectl get hpa -n <namespace>
Expected Output:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
my-api Deployment/... 68%/70% 2 10 4
Reading the Output:
TARGETS: Current metric / Target metricREPLICAS: Current number of running pods68%/70%: Current CPU is 68%, target is 70% (no scaling needed)
Watch Real-Time Scaling
kubectl get hpa -w -n <namespace>
Detailed HPA Information
kubectl describe hpa <asset-name> -n <namespace>
View Pod Metrics
kubectl top pods -n <namespace>
Troubleshooting Autoscaling
Problem: Autoscaler not scaling up despite high CPU
Possible Causes:
- Already at max replicas
- Metrics server not installed/working
- Resource requests not defined
Debug Commands:
# Verify metrics server
kubectl top pods -n <namespace>
# Ensure resource requests are set (required for HPA)
kubectl describe deployment <asset-name> -n <namespace> | grep -A 5 "Requests"
Fix: Resource requests must be defined for HPA to work:
resources:
requests:
cpu: 100m
memory: 128Mi
Configure resource requests in the Codiac web UI under asset settings.
Problem: Autoscaler thrashing (constantly scaling up/down)
Cause: Target metric too close to actual usage, causing constant adjustments.
Solution: Increase target threshold in the web UI (e.g., from 70% to 80%).
Problem: Scaling too slowly during traffic spike
Cause: Default scale-up behavior is conservative (doubles replicas every 3 minutes).
Solutions:
- Increase max replicas to allow more headroom
- Lower target CPU to trigger scaling earlier
- Use predictive scaling (schedule-based scaling for known patterns)
Autoscaling Best Practices
1. Always Set Resource Requests
Autoscaling requires CPU/memory requests to be defined on containers:
resources:
requests:
cpu: 200m # Required for CPU-based autoscaling
memory: 256Mi # Required for memory-based autoscaling
limits:
cpu: 500m
memory: 512Mi
Why: HPA calculates utilization as (actual usage / requested resources). Without requests, HPA doesn't know what "70% CPU" means.
2. Start Conservative, Then Optimize
Initial Configuration:
- Min: 2-3 replicas (for HA)
- Max: 2-3x your average expected load
- CPU Target: 70%
Monitor for 1-2 weeks, then adjust:
- Too many scale-ups? Lower target CPU
- Too many scale-downs? Raise target CPU
- Hitting max frequently? Increase max replicas
3. Set Appropriate Min/Max Bounds
Min Replicas:
- Production: Minimum 2 for zero-downtime deployments
- Staging/Dev: Can be 1 to save costs
- HA-Critical Services: Minimum 3 across availability zones
Max Replicas:
- Calculate:
(Peak Traffic / Per-Pod Capacity) × 1.5(50% buffer) - Consider cost limits (max replicas × cost per pod)
- Set alerts when approaching max
4. Use CPU for Most Use Cases
CPU-based scaling works best for:
- Web APIs and microservices
- Request-driven workloads
- Most containerized applications
Memory-based scaling better for:
- Caching layers (Redis, Memcached)
- In-memory databases
- Data processing pipelines
5. Avoid Autoscaling Stateful Workloads
Don't autoscale:
- Databases (use read replicas instead)
- Stateful sets with persistent volumes
- Singleton services (queue consumers with global state)
Use autoscaling for:
- Stateless APIs
- Background workers (if idempotent)
- Read-only services
6. Test Scaling Behavior
Before production, verify autoscaling works:
Load Test:
# Generate artificial load
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh
# Inside the pod, generate CPU load
while true; do wget -q -O- http://my-api-service; done
Watch Scaling:
kubectl get hpa -w -n prod
Expected: Replicas increase within 1-3 minutes as CPU rises.
7. Monitor Costs
Autoscaling can increase cloud costs if not monitored:
- Set max replica limits
- Use Zombie Mode for non-prod environments
- Track cost per replica and set budgets
- Alert on sustained high replica counts
Common Autoscaling Patterns
Pattern 1: User-Facing Web API
Min: 3 replicas (HA across 3 AZs)
Max: 50 replicas (handles 10x normal traffic)
CPU Target: 60% (low latency)
Pattern 2: Background Job Processor
Min: 1 replica (cost optimization)
Max: 20 replicas (limit concurrent jobs)
CPU Target: 85% (maximize resource usage)
Pattern 3: Caching Layer
Min: 2 replicas (HA)
Max: 10 replicas (memory-bound)
Memory Target: 75%
Pattern 4: Event-Driven Microservice
Min: 2 replicas (HA)
Max: 100 replicas (burst capacity)
CPU Target: 70%
Custom Metric: Messages in queue
Advanced: Custom Metrics
For autoscaling based on application-specific metrics (e.g., queue depth, requests/second), Kubernetes supports custom metrics via adapters.
Common Custom Metrics:
- RabbitMQ queue depth
- HTTP requests per second
- Active connections
- Custom business metrics
Setup: Requires installing metrics adapter (Prometheus, Datadog, etc.) and configuring HPA with custom metric source.
Documentation: Kubernetes HPA Walkthrough
FAQ
Q: How quickly does autoscaling react to load changes?
A: HPA checks metrics every 15 seconds by default. Scale-up happens within 1-3 minutes. Scale-down is more conservative (5 minutes) to avoid thrashing.
Q: Can I schedule scaling (e.g., scale up during business hours)?
A: HPA is reactive, not scheduled. For predictive scaling, use CronJobs to adjust min/max replicas based on time of day, or use Kubernetes Vertical Pod Autoscaler (VPA) for right-sizing.
Q: What's the difference between HPA and VPA?
A: HPA (Horizontal Pod Autoscaler) adds/removes pods. VPA (Vertical Pod Autoscaler) adjusts CPU/memory limits per pod. HPA is more common.
Q: Does autoscaling work during deployments?
A: Yes, HPA continues to operate during rolling updates. New pods are added to the replica count as they become ready.
Q: Can I autoscale to zero replicas?
A: No, min replicas must be at least 1. For zero-scaling, use Knative or KEDA (event-driven autoscaling).
Related Documentation
- Probes - Health checks for autoscaled pods
- Asset Management - Deploying and configuring assets
- Zombie Mode - Cost optimization for non-prod environments
- Glossary: Autoscaling
Need help optimizing autoscaling? Contact Support or check our performance tuning guide.