Health Probes
Ensure your applications are healthy and recover automatically from failures using Kubernetes health probes. Codiac simplifies probe configuration for both CLI and web UI workflows.
What Are Probes?
Health probes are automated checks that Kubernetes uses to monitor container health and make traffic routing decisions. Probes detect failures early and trigger automatic recovery actions.
Business Value:
- Automatic failure detection: Identify unhealthy containers before they impact users
- Self-healing infrastructure: Kubernetes automatically restarts failed containers
- Zero manual intervention: No pager duty for transient failures
- Improved uptime: Only route traffic to healthy instances
Types of Probes
Codiac supports all three Kubernetes probe types:
1. Liveness Probe
Question: "Is the container alive and functioning?"
- Detects when a container is stuck, deadlocked, or in an unrecoverable state
- Action: Kubernetes restarts the container if liveness probe fails
- Use Case: Detect application crashes, infinite loops, or resource exhaustion
Example Scenario: Your API is running but stuck in an infinite loop. The liveness probe fails, Kubernetes kills the container and starts a fresh instance.
2. Readiness Probe
Question: "Is the container ready to serve traffic?"
- Determines if a container should receive requests
- Action: Kubernetes removes the pod from service endpoints if readiness probe fails
- Use Case: Temporary unavailability (loading data, waiting for dependencies, warming up caches)
Example Scenario: During deployment, your application needs 30 seconds to populate its cache. The readiness probe ensures traffic only flows after the cache is ready.
3. Startup Probe
Question: "Has the container finished initializing?"
- Gives slow-starting containers extra time to start before liveness checks begin
- Action: Disables liveness and readiness probes until startup succeeds
- Use Case: Applications with long initialization (loading large datasets, database migrations)
Example Scenario: Your ML model takes 2 minutes to load into memory at startup. The startup probe gives it time to initialize before liveness probes start.
Probe Check Methods
Each probe can use one of three check methods:
| Method | How It Works | Best For |
|---|---|---|
| HTTP GET | Sends HTTP request to a specific path | Web servers, REST APIs |
| TCP Socket | Attempts TCP connection to a port | Databases, non-HTTP services |
| Exec Command | Runs a command inside the container | Custom health logic |
How Probes Work Together
Container Starts
↓
[Startup Probe] ────────> Success ───┐
↓ │
Failure (restart) │
↓
[Liveness Probe] + [Readiness Probe]
↓ ↓
Still alive? Ready for traffic?
↓ ↓
Yes/No Yes/No
Key Rules:
- Liveness and readiness probes don't run until startup probe succeeds (if configured)
- Readiness failures remove pod from load balancer (temporary)
- Liveness failures trigger container restart (permanent fix)
- CLI
- Web UI
Creating Probes with CLI
Use the cod asset probe create command to configure health checks for your assets.
Basic Syntax
cod asset probe create [COMMAND] [FLAGS]
HTTP Probe Example
Configure an HTTP liveness probe that checks /health endpoint:
cod asset probe create \
--asset my-api \
--cabinet prod \
--type liveness \
--method http \
--path /health \
--port 3000 \
--initial-delay 10 \
--period 10 \
--timeout 5 \
--failure-threshold 3
Expected Outcome: Kubernetes will:
- Wait 10 seconds after container starts
- Send HTTP GET to
http://container:3000/healthevery 10 seconds - Timeout after 5 seconds if no response
- Restart container after 3 consecutive failures
TCP Socket Probe Example
Check if a database port is accepting connections:
cod asset probe create \
--asset postgres-db \
--cabinet prod \
--type readiness \
--method tcp \
--port 5432 \
--initial-delay 5 \
--period 5
Exec Command Probe Example
Run a custom health check script:
cod asset probe create \
--asset batch-processor \
--cabinet prod \
--type liveness \
--method exec \
--command "/app/healthcheck.sh" \
--initial-delay 30 \
--period 20 \
--timeout 10
Script Requirements:
- Exit code 0 = healthy
- Any other exit code = unhealthy
Probe Configuration Options
| Flag | Description | Default | Example |
|---|---|---|---|
--asset | Asset name | Required | my-api |
--cabinet | Cabinet name | Required | prod |
--type | Probe type: liveness, readiness, startup | Required | liveness |
--method | Check method: http, tcp, exec | Required | http |
--path | HTTP path (for HTTP method) | / | /health |
--port | Container port to check | Required | 3000 |
--command | Command to execute (for exec method) | - | /app/check.sh |
--initial-delay | Seconds to wait before first probe | 0 | 10 |
--period | Seconds between probes | 10 | 15 |
--timeout | Seconds before probe times out | 1 | 5 |
--success-threshold | Consecutive successes to mark healthy | 1 | 1 |
--failure-threshold | Consecutive failures to mark unhealthy | 3 | 3 |
Best Practice Examples
Fast-Starting Web Application
# Liveness: Check HTTP endpoint every 10s
cod asset probe create \
--asset web-app \
--type liveness \
--method http \
--path /healthz \
--port 8080 \
--period 10 \
--failure-threshold 3
# Readiness: Wait for dependencies
cod asset probe create \
--asset web-app \
--type readiness \
--method http \
--path /ready \
--port 8080 \
--initial-delay 5 \
--period 5
Slow-Starting Database
# Startup: Give it 5 minutes to initialize
cod asset probe create \
--asset postgres \
--type startup \
--method tcp \
--port 5432 \
--initial-delay 0 \
--period 10 \
--failure-threshold 30 # 30 * 10s = 5 min max startup time
# Liveness: Once started, check every 30s
cod asset probe create \
--asset postgres \
--type liveness \
--method tcp \
--port 5432 \
--period 30 \
--failure-threshold 3
Machine Learning Service
# Startup: Allow 3 minutes for model loading
cod asset probe create \
--asset ml-inference \
--type startup \
--method http \
--path /startup \
--port 5000 \
--period 10 \
--failure-threshold 18 # 18 * 10s = 3 min max
# Readiness: Check model is loaded
cod asset probe create \
--asset ml-inference \
--type readiness \
--method http \
--path /ready \
--port 5000 \
--period 5
# Liveness: Ensure model server is responsive
cod asset probe create \
--asset ml-inference \
--type liveness \
--method http \
--path /health \
--port 5000 \
--period 15 \
--timeout 10
Viewing Configured Probes
Check which probes are configured for an asset:
cod asset view my-api --cabinet prod
Expected Output:
Asset: my-api
Cabinet: prod
Probes:
Liveness: HTTP GET :3000/health (every 10s, threshold: 3)
Readiness: HTTP GET :3000/ready (every 5s, threshold: 3)
Troubleshooting Probe Failures
Check Pod Events
kubectl get events --field-selector involvedObject.name=my-api-pod
Look for:
Unhealthy- Probe is failingKilling- Liveness probe triggered restartReadiness probe failed- Pod removed from service
View Container Logs
cod asset logs my-api --cabinet prod
Common Issues
Problem: Liveness probe fails immediately after deployment
Solution: Increase --initial-delay to give the container time to start
Problem: Readiness probe never succeeds
Solution: Check if your /ready endpoint exists and returns 200 OK
Problem: Container restarts in a loop
Solution: Liveness probe may be too aggressive. Increase --period or --failure-threshold
Creating Probes in Web UI
Configure health probes through Codiac's visual interface.
Step 1: Navigate to Asset Configuration
- Open Codiac web UI at https://app.codiac.io
- Select your Enterprise
- Select the Environment containing your asset
- Navigate to the Cabinet
- Click on your Asset
Step 2: Add Probe
- Click Configure or Probes tab
- Click Add Probe button
- Select probe type from dropdown:
- Liveness Probe
- Readiness Probe
- Startup Probe
Step 3: Configure Probe Settings
HTTP Probe Configuration
- Check Method: Select "HTTP"
- Path: Enter endpoint path (e.g.,
/health,/healthz) - Port: Enter container port (e.g.,
3000,8080) - HTTP Headers: (Optional) Add custom headers if needed
Timing Settings:
- Initial Delay: Seconds to wait before first check
- Period: Seconds between checks
- Timeout: Seconds before request times out
- Success Threshold: Consecutive successes needed
- Failure Threshold: Consecutive failures before action
TCP Socket Probe Configuration
- Check Method: Select "TCP"
- Port: Enter port number to check (e.g.,
5432for PostgreSQL) - Configure timing settings (same as HTTP)
Exec Command Probe Configuration
- Check Method: Select "Exec"
- Command: Enter full command to execute
- Use full paths (e.g.,
/app/healthcheck.sh) - Command must exit with code 0 for success
- Use full paths (e.g.,
- Configure timing settings (same as HTTP)
Step 4: Save and Deploy
- Click Save to persist probe configuration
- Click Deploy to apply changes to running asset
- Monitor deployment status
Expected Outcome:
- Probe configuration saved to asset definition
- Updated deployment includes health checks
- Kubernetes begins monitoring container health
Viewing Probe Status
Asset Health Dashboard
- Navigate to your asset in the web UI
- View Health Status section
- See real-time probe results:
- ✅ Green = Probe passing
- ⚠️ Yellow = Probe warning
- ❌ Red = Probe failing
Probe History
View historical probe failures:
- Click History or Events tab
- Filter by "Probe Failures"
- See timestamps and failure counts
Editing Existing Probes
- Navigate to asset configuration
- Find probe in Probes section
- Click Edit icon
- Modify settings
- Click Save and Deploy
Removing Probes
- Navigate to asset configuration
- Find probe in Probes section
- Click Delete or Remove icon
- Confirm deletion
- Deploy changes to update running asset
Probe Best Practices
1. Always Configure Liveness and Readiness
At minimum, every production asset should have:
- Liveness probe - Detect and recover from crashes
- Readiness probe - Avoid sending traffic to initializing pods
2. Make Probes Lightweight
Probe checks run frequently. Keep them fast and simple:
- ✅ Good: Simple HTTP endpoint that returns 200 OK
- ❌ Bad: Full database query or complex computation
3. Use Startup Probes for Slow Initialization
If your application takes >30 seconds to start:
- Configure a startup probe with generous
failure-threshold - This prevents liveness probe from killing the pod during startup
4. Set Appropriate Timeouts
Too Short: False positives, unnecessary restarts Too Long: Slow failure detection, degraded user experience
Recommended:
- Fast HTTP APIs: 5-10 second timeout
- Databases: 10-30 second timeout
- Complex services: 30-60 second timeout
5. Use Different Endpoints for Liveness vs. Readiness
Liveness: "Am I alive?" (simple, always passes unless broken) Readiness: "Am I ready?" (checks dependencies, may fail temporarily)
Example:
// Liveness: Super simple
app.get('/health', (req, res) => res.status(200).send('OK'));
// Readiness: Check dependencies
app.get('/ready', async (req, res) => {
const dbOk = await checkDatabase();
const cacheOk = await checkCache();
if (dbOk && cacheOk) {
res.status(200).send('Ready');
} else {
res.status(503).send('Not Ready');
}
});
6. Test Probe Endpoints Locally
Before deploying, verify your health check endpoints work:
# Test HTTP probe endpoint
curl http://localhost:3000/health
# Should return 200 OK
7. Monitor Probe Failures
Track probe failure rates:
- Frequent liveness failures = application bugs or resource issues
- Frequent readiness failures = dependency problems or slow startup
Common Probe Patterns
API Service
Liveness: HTTP GET /health every 10s
Readiness: HTTP GET /ready every 5s (checks DB connection)
Database
Startup: TCP :5432 every 10s, 30 failures allowed (5 min max)
Liveness: TCP :5432 every 30s
Readiness: TCP :5432 every 10s
Background Worker
Liveness: Exec "/app/check-process.sh" every 30s
Stateless Web App
Liveness: HTTP GET /healthz every 15s
Readiness: HTTP GET /healthz every 5s (same endpoint, different timing)
Health Check Endpoint Implementation
Node.js / Express
const express = require('express');
const app = express();
app.get('/health', (req, res) => {
res.status(200).send('OK');
});
app.get('/ready', async (req, res) => {
try {
await db.ping();
res.status(200).send('Ready');
} catch (error) {
res.status(503).send('Not Ready');
}
});
app.listen(3000);
Python / Flask
from flask import Flask
app = Flask(__name__)
@app.route('/health')
def health():
return 'OK', 200
@app.route('/ready')
def ready():
try:
db.ping()
return 'Ready', 200
except Exception:
return 'Not Ready', 503
if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)
Go / net/http
package main
import (
"net/http"
)
func healthHandler(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
w.Write([]byte("OK"))
}
func readyHandler(w http.ResponseWriter, r *http.Request) {
if db.Ping() == nil {
w.WriteHeader(http.StatusOK)
w.Write([]byte("Ready"))
} else {
w.WriteHeader(http.StatusServiceUnavailable)
w.Write([]byte("Not Ready"))
}
}
func main() {
http.HandleFunc("/health", healthHandler)
http.HandleFunc("/ready", readyHandler)
http.ListenAndServe(":8080", nil)
}
Troubleshooting Guide
Symptom: Container restarts repeatedly
Possible Causes:
- Liveness probe failing
- Initial delay too short
- Health endpoint doesn't exist
Solutions:
- Check container logs:
cod asset logs - Verify health endpoint works
- Increase
--initial-delay - Increase
--failure-threshold
Symptom: No traffic reaching pods
Possible Causes:
- Readiness probe failing
- Application not fully initialized
Solutions:
- Check readiness probe configuration
- Verify
/readyendpoint returns 200 - Check for dependency issues (database, cache, etc.)
Symptom: Probes timeout
Possible Causes:
- Application is slow or overloaded
- Timeout setting too aggressive
Solutions:
- Increase
--timeoutvalue - Optimize health check endpoint
- Check resource limits (CPU, memory)
Related Documentation
- Asset Management CLI Guide
- Deploy Applications
- Troubleshooting Deployments
- Autoscalers
- Glossary: Probe
FAQ
Q: What's the difference between liveness and readiness probes?
A: Liveness checks if the container is alive (restarts if not). Readiness checks if the container should receive traffic (removes from load balancer if not ready).
Q: How often should probes run?
A: Liveness: Every 10-30 seconds. Readiness: Every 5-10 seconds. Startup: Every 5-10 seconds.
Q: Can I use the same endpoint for liveness and readiness?
A: You can, but it's better to separate them. Readiness should check dependencies; liveness should be simpler.
Q: What happens if all probes fail?
A: Liveness failure → Container restarts. Readiness failure → Pod removed from service. Startup failure → Container eventually restarts.
Q: Do probes cost extra?
A: No, probes are built into Kubernetes and Codiac. No additional cost.
Need help configuring probes? Contact Support or check our troubleshooting guide.