Kubernetes Autoscaling

Automatically scale your application workloads based on demand. Codiac integrates with Kubernetes Horizontal Pod Autoscaler (HPA) to dynamically adjust replica counts based on CPU, memory, or custom metrics.

Web UI Configuration

Autoscaling is configured through the Codiac web UI at app.codiac.io. Navigate to your asset and configure scaling parameters in the visual interface.

What Is Autoscaling?

Autoscaling automatically adjusts the number of running pods (replicas) for an asset based on observed metrics. When load increases, Kubernetes adds more replicas. When load decreases, it scales back down.

Business Value:

Cost optimization: Scale down during low traffic, reduce cloud spending by 40-60%
Performance: Scale up automatically during demand spikes, maintain responsiveness
Zero manual intervention: No need to manually adjust replica counts
Resilience: Handle unexpected traffic bursts without capacity planning

How Autoscaling Works

Low Traffic (10% CPU)          High Traffic (80% CPU)
┌─────────┐                     ┌─────────┐ ┌─────────┐ ┌─────────┐
│  Pod 1  │      Scales Up →    │  Pod 1  │ │  Pod 2  │ │  Pod 3  │
└─────────┘                     └─────────┘ └─────────┘ └─────────┘
   1 replica                        3 replicas

Load Decreases                  Load Increases Again
      ↓                                 ↑
Scales Down ← [Cooldown Period] → Scales Up

Key Concepts:

Min Replicas: Minimum number of pods (never scale below this)
Max Replicas: Maximum number of pods (never scale above this)
Target Metric: CPU/memory utilization percentage that triggers scaling
Cooldown Period: Wait time between scaling actions to prevent thrashing

Supported Metrics

Metric Type	What It Measures	When to Use
CPU	Average CPU utilization across all pods	Most common, works for CPU-bound apps (APIs, web servers)
Memory	Average memory usage across all pods	Memory-intensive apps (caching, data processing)
Custom Metrics	Application-specific metrics (requests/sec, queue depth)	Advanced scenarios requiring domain-specific scaling

Web UI
kubectl

Configuring Autoscaling in Web UI

Manage autoscaling through Codiac's visual interface with real-time scaling metrics.

Step 1: Navigate to Asset

Open Codiac web UI at https://app.codiac.io
Select your Enterprise
Select the Environment
Navigate to the Cabinet
Click on your Asset

Step 2: Configure Autoscaler

Click Scaling or Configure tab
Toggle Enable Autoscaling to ON
Configure scaling parameters:

Min Replicas:

Minimum number of pods to maintain
Recommended: At least 2 for high availability
Never scales below this number

Max Replicas:

Maximum number of pods allowed
Set based on expected peak load
Prevents runaway scaling costs

Target CPU Utilization (%):

CPU percentage that triggers scaling
Recommended: 60-80% for APIs, 80-90% for batch jobs
Leave blank to disable CPU-based scaling

Target Memory Utilization (%):

Memory percentage that triggers scaling
Recommended: 70-85%
Leave blank to disable memory-based scaling

Step 3: Save and Deploy

Click Save to persist autoscaler configuration
Click Deploy to apply changes
Monitor deployment progress

Expected Outcome:

HPA created in Kubernetes
Asset scales based on configured metrics
Real-time scaling visible in UI dashboard

Example Configurations

High-Traffic API

Setting	Value	Rationale
Min Replicas	3	High availability across 3 AZs
Max Replicas	50	Handle 10x normal traffic
CPU Target	60%	Keep latency low

Background Worker

Setting	Value	Rationale
Min Replicas	1	Cost optimization
Max Replicas	10	Limit concurrent jobs
CPU Target	85%	Maximize resource usage

Caching Layer

Setting	Value	Rationale
Min Replicas	2	High availability
Max Replicas	10	Memory-bound
Memory Target	75%	Scale on memory pressure

Monitoring Autoscaler Activity

Scaling Metrics Dashboard

Navigate to asset in web UI
View Scaling Metrics section
See real-time data:
- Current replica count
- Current CPU/memory utilization
- Target thresholds
- Scaling history (last 24 hours)

Metrics Displayed:

Current Replicas: Number of running pods
CPU Usage: Average across all pods (e.g., "68% / 70% target")
Memory Usage: Average across all pods
Last Scaled: Timestamp and action (scaled up/down, from X to Y replicas)

Scaling Event History

View historical scaling events:

Click History or Events tab
Filter by "Scaling Events"
See timeline of scale-up/scale-down actions

Example Events:

14:32 - Scaled up from 3 to 5 replicas (CPU: 82%)
15:45 - Scaled down from 5 to 4 replicas (CPU: 58%)

Editing Autoscaler Configuration

Navigate to asset Scaling tab
Modify min/max replicas or target percentages
Click Save and Deploy
Changes take effect within 1-2 minutes

Disabling Autoscaling

Navigate to asset Scaling tab
Toggle Enable Autoscaling to OFF
Set Fixed Replicas count
Click Save and Deploy

Result:

Autoscaler removed
Asset runs with fixed replica count
No automatic scaling

Viewing Autoscaler Status with kubectl

While autoscaling is configured through the Codiac web UI, you can monitor autoscaler status directly with kubectl.

View HPA Status

kubectl get hpa -n <namespace>

Expected Output:

NAME     REFERENCE        TARGETS   MINPODS   MAXPODS   REPLICAS
my-api   Deployment/...   68%/70%   2         10        4

Reading the Output:

TARGETS: Current metric / Target metric
REPLICAS: Current number of running pods
68%/70%: Current CPU is 68%, target is 70% (no scaling needed)

Watch Real-Time Scaling

kubectl get hpa -w -n <namespace>

Detailed HPA Information

kubectl describe hpa <asset-name> -n <namespace>

View Pod Metrics

kubectl top pods -n <namespace>

Troubleshooting Autoscaling

Problem: Autoscaler not scaling up despite high CPU

Possible Causes:

Already at max replicas
Metrics server not installed/working
Resource requests not defined

Debug Commands:

# Verify metrics server
kubectl top pods -n <namespace>

# Ensure resource requests are set (required for HPA)
kubectl describe deployment <asset-name> -n <namespace> | grep -A 5 "Requests"

Fix: Resource requests must be defined for HPA to work:

resources:
  requests:
    cpu: 100m
    memory: 128Mi

Configure resource requests in the Codiac web UI under asset settings.

Problem: Autoscaler thrashing (constantly scaling up/down)

Cause: Target metric too close to actual usage, causing constant adjustments.

Solution: Increase target threshold in the web UI (e.g., from 70% to 80%).

Problem: Scaling too slowly during traffic spike

Cause: Default scale-up behavior is conservative (doubles replicas every 3 minutes).

Solutions:

Increase max replicas to allow more headroom
Lower target CPU to trigger scaling earlier
Use predictive scaling (schedule-based scaling for known patterns)

Autoscaling Best Practices

1. Always Set Resource Requests

Autoscaling requires CPU/memory requests to be defined on containers:

resources:
  requests:
    cpu: 200m      # Required for CPU-based autoscaling
    memory: 256Mi  # Required for memory-based autoscaling
  limits:
    cpu: 500m
    memory: 512Mi

Why: HPA calculates utilization as (actual usage / requested resources). Without requests, HPA doesn't know what "70% CPU" means.

2. Start Conservative, Then Optimize

Initial Configuration:

Min: 2-3 replicas (for HA)
Max: 2-3x your average expected load
CPU Target: 70%

Monitor for 1-2 weeks, then adjust:

Too many scale-ups? Lower target CPU
Too many scale-downs? Raise target CPU
Hitting max frequently? Increase max replicas

3. Set Appropriate Min/Max Bounds

Min Replicas:

Production: Minimum 2 for zero-downtime deployments
Staging/Dev: Can be 1 to save costs
HA-Critical Services: Minimum 3 across availability zones

Max Replicas:

Calculate: (Peak Traffic / Per-Pod Capacity) × 1.5 (50% buffer)
Consider cost limits (max replicas × cost per pod)
Set alerts when approaching max

4. Use CPU for Most Use Cases

CPU-based scaling works best for:

Web APIs and microservices
Request-driven workloads
Most containerized applications

Memory-based scaling better for:

Caching layers (Redis, Memcached)
In-memory databases
Data processing pipelines

5. Avoid Autoscaling Stateful Workloads

Don't autoscale:

Databases (use read replicas instead)
Stateful sets with persistent volumes
Singleton services (queue consumers with global state)

Use autoscaling for:

Stateless APIs
Background workers (if idempotent)
Read-only services

6. Test Scaling Behavior

Before production, verify autoscaling works:

Load Test:

# Generate artificial load
kubectl run -i --tty load-generator --rm --image=busybox --restart=Never -- /bin/sh

# Inside the pod, generate CPU load
while true; do wget -q -O- http://my-api-service; done

Watch Scaling:

kubectl get hpa -w -n prod

Expected: Replicas increase within 1-3 minutes as CPU rises.

7. Monitor Costs

Autoscaling can increase cloud costs if not monitored:

Set max replica limits
Use Zombie Mode for non-prod environments
Track cost per replica and set budgets
Alert on sustained high replica counts

Common Autoscaling Patterns

Pattern 1: User-Facing Web API

Min: 3 replicas (HA across 3 AZs)
Max: 50 replicas (handles 10x normal traffic)
CPU Target: 60% (low latency)

Pattern 2: Background Job Processor

Min: 1 replica (cost optimization)
Max: 20 replicas (limit concurrent jobs)
CPU Target: 85% (maximize resource usage)

Pattern 3: Caching Layer

Min: 2 replicas (HA)
Max: 10 replicas (memory-bound)
Memory Target: 75%

Pattern 4: Event-Driven Microservice

Min: 2 replicas (HA)
Max: 100 replicas (burst capacity)
CPU Target: 70%
Custom Metric: Messages in queue

Advanced: Custom Metrics

For autoscaling based on application-specific metrics (e.g., queue depth, requests/second), Kubernetes supports custom metrics via adapters.

Common Custom Metrics:

RabbitMQ queue depth
HTTP requests per second
Active connections
Custom business metrics

Setup: Requires installing metrics adapter (Prometheus, Datadog, etc.) and configuring HPA with custom metric source.

Documentation: Kubernetes HPA Walkthrough

FAQ

Q: How quickly does autoscaling react to load changes?

A: HPA checks metrics every 15 seconds by default. Scale-up happens within 1-3 minutes. Scale-down is more conservative (5 minutes) to avoid thrashing.

Q: Can I schedule scaling (e.g., scale up during business hours)?

A: HPA is reactive, not scheduled. For predictive scaling, use CronJobs to adjust min/max replicas based on time of day, or use Kubernetes Vertical Pod Autoscaler (VPA) for right-sizing.

Q: What's the difference between HPA and VPA?

A: HPA (Horizontal Pod Autoscaler) adds/removes pods. VPA (Vertical Pod Autoscaler) adjusts CPU/memory limits per pod. HPA is more common.

Q: Does autoscaling work during deployments?

A: Yes, HPA continues to operate during rolling updates. New pods are added to the replica count as they become ready.

Q: Can I autoscale to zero replicas?

A: No, min replicas must be at least 1. For zero-scaling, use Knative or KEDA (event-driven autoscaling).

Probes - Health checks for autoscaled pods
Asset Management - Deploying and configuring assets
Zombie Mode - Cost optimization for non-prod environments
Glossary: Autoscaling

Need help optimizing autoscaling? Contact Support or check our performance tuning guide.

What Is Autoscaling?​

How Autoscaling Works​

Supported Metrics​

Configuring Autoscaling in Web UI​

Step 1: Navigate to Asset​

Step 2: Configure Autoscaler​

Step 3: Save and Deploy​

Example Configurations​

High-Traffic API​

Background Worker​

Caching Layer​

Monitoring Autoscaler Activity​

Scaling Metrics Dashboard​

Scaling Event History​

Editing Autoscaler Configuration​

Disabling Autoscaling​

Viewing Autoscaler Status with kubectl​

View HPA Status​

Watch Real-Time Scaling​

Detailed HPA Information​

View Pod Metrics​

Troubleshooting Autoscaling​

Problem: Autoscaler not scaling up despite high CPU​

Problem: Autoscaler thrashing (constantly scaling up/down)​

Problem: Scaling too slowly during traffic spike​

Autoscaling Best Practices​

1. Always Set Resource Requests​

2. Start Conservative, Then Optimize​

3. Set Appropriate Min/Max Bounds​

4. Use CPU for Most Use Cases​

5. Avoid Autoscaling Stateful Workloads​

6. Test Scaling Behavior​

7. Monitor Costs​

Common Autoscaling Patterns​

Pattern 1: User-Facing Web API​

Pattern 2: Background Job Processor​

Pattern 3: Caching Layer​

Pattern 4: Event-Driven Microservice​

Advanced: Custom Metrics​

FAQ​

Related Documentation​