Skip to main content

Health Probes

Ensure your applications are healthy and recover automatically from failures using Kubernetes health probes. Codiac simplifies probe configuration for both CLI and web UI workflows.

What Are Probes?

Health probes are automated checks that Kubernetes uses to monitor container health and make traffic routing decisions. Probes detect failures early and trigger automatic recovery actions.

Business Value:

  • Automatic failure detection: Identify unhealthy containers before they impact users
  • Self-healing infrastructure: Kubernetes automatically restarts failed containers
  • Zero manual intervention: No pager duty for transient failures
  • Improved uptime: Only route traffic to healthy instances

Types of Probes

Codiac supports all three Kubernetes probe types:

1. Liveness Probe

Question: "Is the container alive and functioning?"

  • Detects when a container is stuck, deadlocked, or in an unrecoverable state
  • Action: Kubernetes restarts the container if liveness probe fails
  • Use Case: Detect application crashes, infinite loops, or resource exhaustion

Example Scenario: Your API is running but stuck in an infinite loop. The liveness probe fails, Kubernetes kills the container and starts a fresh instance.

2. Readiness Probe

Question: "Is the container ready to serve traffic?"

  • Determines if a container should receive requests
  • Action: Kubernetes removes the pod from service endpoints if readiness probe fails
  • Use Case: Temporary unavailability (loading data, waiting for dependencies, warming up caches)

Example Scenario: During deployment, your application needs 30 seconds to populate its cache. The readiness probe ensures traffic only flows after the cache is ready.

3. Startup Probe

Question: "Has the container finished initializing?"

  • Gives slow-starting containers extra time to start before liveness checks begin
  • Action: Disables liveness and readiness probes until startup succeeds
  • Use Case: Applications with long initialization (loading large datasets, database migrations)

Example Scenario: Your ML model takes 2 minutes to load into memory at startup. The startup probe gives it time to initialize before liveness probes start.


Probe Check Methods

Each probe can use one of three check methods:

MethodHow It WorksBest For
HTTP GETSends HTTP request to a specific pathWeb servers, REST APIs
TCP SocketAttempts TCP connection to a portDatabases, non-HTTP services
Exec CommandRuns a command inside the containerCustom health logic

How Probes Work Together

Container Starts

[Startup Probe] ────────> Success ───┐
↓ │
Failure (restart) │

[Liveness Probe] + [Readiness Probe]
↓ ↓
Still alive? Ready for traffic?
↓ ↓
Yes/No Yes/No

Key Rules:

  • Liveness and readiness probes don't run until startup probe succeeds (if configured)
  • Readiness failures remove pod from load balancer (temporary)
  • Liveness failures trigger container restart (permanent fix)

Creating Probes with CLI

Use the cod asset probe create command to configure health checks for your assets.

Basic Syntax

cod asset probe create [COMMAND] [FLAGS]

HTTP Probe Example

Configure an HTTP liveness probe that checks /health endpoint:

cod asset probe create \
--asset my-api \
--cabinet prod \
--type liveness \
--method http \
--path /health \
--port 3000 \
--initial-delay 10 \
--period 10 \
--timeout 5 \
--failure-threshold 3

Expected Outcome: Kubernetes will:

  • Wait 10 seconds after container starts
  • Send HTTP GET to http://container:3000/health every 10 seconds
  • Timeout after 5 seconds if no response
  • Restart container after 3 consecutive failures

TCP Socket Probe Example

Check if a database port is accepting connections:

cod asset probe create \
--asset postgres-db \
--cabinet prod \
--type readiness \
--method tcp \
--port 5432 \
--initial-delay 5 \
--period 5

Exec Command Probe Example

Run a custom health check script:

cod asset probe create \
--asset batch-processor \
--cabinet prod \
--type liveness \
--method exec \
--command "/app/healthcheck.sh" \
--initial-delay 30 \
--period 20 \
--timeout 10

Script Requirements:

  • Exit code 0 = healthy
  • Any other exit code = unhealthy

Probe Configuration Options

FlagDescriptionDefaultExample
--assetAsset nameRequiredmy-api
--cabinetCabinet nameRequiredprod
--typeProbe type: liveness, readiness, startupRequiredliveness
--methodCheck method: http, tcp, execRequiredhttp
--pathHTTP path (for HTTP method)//health
--portContainer port to checkRequired3000
--commandCommand to execute (for exec method)-/app/check.sh
--initial-delaySeconds to wait before first probe010
--periodSeconds between probes1015
--timeoutSeconds before probe times out15
--success-thresholdConsecutive successes to mark healthy11
--failure-thresholdConsecutive failures to mark unhealthy33

Best Practice Examples

Fast-Starting Web Application

# Liveness: Check HTTP endpoint every 10s
cod asset probe create \
--asset web-app \
--type liveness \
--method http \
--path /healthz \
--port 8080 \
--period 10 \
--failure-threshold 3

# Readiness: Wait for dependencies
cod asset probe create \
--asset web-app \
--type readiness \
--method http \
--path /ready \
--port 8080 \
--initial-delay 5 \
--period 5

Slow-Starting Database

# Startup: Give it 5 minutes to initialize
cod asset probe create \
--asset postgres \
--type startup \
--method tcp \
--port 5432 \
--initial-delay 0 \
--period 10 \
--failure-threshold 30 # 30 * 10s = 5 min max startup time

# Liveness: Once started, check every 30s
cod asset probe create \
--asset postgres \
--type liveness \
--method tcp \
--port 5432 \
--period 30 \
--failure-threshold 3

Machine Learning Service

# Startup: Allow 3 minutes for model loading
cod asset probe create \
--asset ml-inference \
--type startup \
--method http \
--path /startup \
--port 5000 \
--period 10 \
--failure-threshold 18 # 18 * 10s = 3 min max

# Readiness: Check model is loaded
cod asset probe create \
--asset ml-inference \
--type readiness \
--method http \
--path /ready \
--port 5000 \
--period 5

# Liveness: Ensure model server is responsive
cod asset probe create \
--asset ml-inference \
--type liveness \
--method http \
--path /health \
--port 5000 \
--period 15 \
--timeout 10

Viewing Configured Probes

Check which probes are configured for an asset:

cod asset view my-api --cabinet prod

Expected Output:

Asset: my-api
Cabinet: prod
Probes:
Liveness: HTTP GET :3000/health (every 10s, threshold: 3)
Readiness: HTTP GET :3000/ready (every 5s, threshold: 3)

Troubleshooting Probe Failures

Check Pod Events

kubectl get events --field-selector involvedObject.name=my-api-pod

Look for:

  • Unhealthy - Probe is failing
  • Killing - Liveness probe triggered restart
  • Readiness probe failed - Pod removed from service

View Container Logs

cod asset logs my-api --cabinet prod

Common Issues

Problem: Liveness probe fails immediately after deployment

Solution: Increase --initial-delay to give the container time to start

Problem: Readiness probe never succeeds

Solution: Check if your /ready endpoint exists and returns 200 OK

Problem: Container restarts in a loop

Solution: Liveness probe may be too aggressive. Increase --period or --failure-threshold


Probe Best Practices

1. Always Configure Liveness and Readiness

At minimum, every production asset should have:

  • Liveness probe - Detect and recover from crashes
  • Readiness probe - Avoid sending traffic to initializing pods

2. Make Probes Lightweight

Probe checks run frequently. Keep them fast and simple:

  • Good: Simple HTTP endpoint that returns 200 OK
  • Bad: Full database query or complex computation

3. Use Startup Probes for Slow Initialization

If your application takes >30 seconds to start:

  • Configure a startup probe with generous failure-threshold
  • This prevents liveness probe from killing the pod during startup

4. Set Appropriate Timeouts

Too Short: False positives, unnecessary restarts Too Long: Slow failure detection, degraded user experience

Recommended:

  • Fast HTTP APIs: 5-10 second timeout
  • Databases: 10-30 second timeout
  • Complex services: 30-60 second timeout

5. Use Different Endpoints for Liveness vs. Readiness

Liveness: "Am I alive?" (simple, always passes unless broken) Readiness: "Am I ready?" (checks dependencies, may fail temporarily)

Example:

// Liveness: Super simple
app.get('/health', (req, res) => res.status(200).send('OK'));

// Readiness: Check dependencies
app.get('/ready', async (req, res) => {
const dbOk = await checkDatabase();
const cacheOk = await checkCache();

if (dbOk && cacheOk) {
res.status(200).send('Ready');
} else {
res.status(503).send('Not Ready');
}
});

6. Test Probe Endpoints Locally

Before deploying, verify your health check endpoints work:

# Test HTTP probe endpoint
curl http://localhost:3000/health

# Should return 200 OK

7. Monitor Probe Failures

Track probe failure rates:

  • Frequent liveness failures = application bugs or resource issues
  • Frequent readiness failures = dependency problems or slow startup

Common Probe Patterns

API Service

Liveness:  HTTP GET /health every 10s
Readiness: HTTP GET /ready every 5s (checks DB connection)

Database

Startup:   TCP :5432 every 10s, 30 failures allowed (5 min max)
Liveness: TCP :5432 every 30s
Readiness: TCP :5432 every 10s

Background Worker

Liveness: Exec "/app/check-process.sh" every 30s

Stateless Web App

Liveness:  HTTP GET /healthz every 15s
Readiness: HTTP GET /healthz every 5s (same endpoint, different timing)

Health Check Endpoint Implementation

Node.js / Express

const express = require('express');
const app = express();

app.get('/health', (req, res) => {
res.status(200).send('OK');
});

app.get('/ready', async (req, res) => {
try {
await db.ping();
res.status(200).send('Ready');
} catch (error) {
res.status(503).send('Not Ready');
}
});

app.listen(3000);

Python / Flask

from flask import Flask
app = Flask(__name__)

@app.route('/health')
def health():
return 'OK', 200

@app.route('/ready')
def ready():
try:
db.ping()
return 'Ready', 200
except Exception:
return 'Not Ready', 503

if __name__ == '__main__':
app.run(host='0.0.0.0', port=5000)

Go / net/http

package main

import (
"net/http"
)

func healthHandler(w http.ResponseWriter, r *http.Request) {
w.WriteHeader(http.StatusOK)
w.Write([]byte("OK"))
}

func readyHandler(w http.ResponseWriter, r *http.Request) {
if db.Ping() == nil {
w.WriteHeader(http.StatusOK)
w.Write([]byte("Ready"))
} else {
w.WriteHeader(http.StatusServiceUnavailable)
w.Write([]byte("Not Ready"))
}
}

func main() {
http.HandleFunc("/health", healthHandler)
http.HandleFunc("/ready", readyHandler)
http.ListenAndServe(":8080", nil)
}

Troubleshooting Guide

Symptom: Container restarts repeatedly

Possible Causes:

  1. Liveness probe failing
  2. Initial delay too short
  3. Health endpoint doesn't exist

Solutions:

  • Check container logs: cod asset logs
  • Verify health endpoint works
  • Increase --initial-delay
  • Increase --failure-threshold

Symptom: No traffic reaching pods

Possible Causes:

  1. Readiness probe failing
  2. Application not fully initialized

Solutions:

  • Check readiness probe configuration
  • Verify /ready endpoint returns 200
  • Check for dependency issues (database, cache, etc.)

Symptom: Probes timeout

Possible Causes:

  1. Application is slow or overloaded
  2. Timeout setting too aggressive

Solutions:

  • Increase --timeout value
  • Optimize health check endpoint
  • Check resource limits (CPU, memory)


FAQ

Q: What's the difference between liveness and readiness probes?

A: Liveness checks if the container is alive (restarts if not). Readiness checks if the container should receive traffic (removes from load balancer if not ready).

Q: How often should probes run?

A: Liveness: Every 10-30 seconds. Readiness: Every 5-10 seconds. Startup: Every 5-10 seconds.

Q: Can I use the same endpoint for liveness and readiness?

A: You can, but it's better to separate them. Readiness should check dependencies; liveness should be simpler.

Q: What happens if all probes fail?

A: Liveness failure → Container restarts. Readiness failure → Pod removed from service. Startup failure → Container eventually restarts.

Q: Do probes cost extra?

A: No, probes are built into Kubernetes and Codiac. No additional cost.


Need help configuring probes? Contact Support or check our troubleshooting guide.