Health Probes

Ensure your applications are healthy and recover automatically from failures using Kubernetes health probes. Codiac simplifies probe configuration for both CLI and web UI workflows.

What Are Probes?

Health probes are automated checks that Kubernetes uses to monitor container health and make traffic routing decisions. Probes detect failures early and trigger automatic recovery actions.

Business Value:

Automatic failure detection: Identify unhealthy containers before they impact users
Self-healing infrastructure: Kubernetes automatically restarts failed containers
Zero manual intervention: No pager duty for transient failures
Improved uptime: Only route traffic to healthy instances

Types of Probes

Codiac supports all three Kubernetes probe types:

1. Liveness Probe

Question: "Is the container alive and functioning?"

Detects when a container is stuck, deadlocked, or in an unrecoverable state
Action: Kubernetes restarts the container if liveness probe fails
Use Case: Detect application crashes, infinite loops, or resource exhaustion

Example Scenario: Your API is running but stuck in an infinite loop. The liveness probe fails, Kubernetes kills the container and starts a fresh instance.

2. Readiness Probe

Question: "Is the container ready to serve traffic?"

Determines if a container should receive requests
Action: Kubernetes removes the pod from service endpoints if readiness probe fails
Use Case: Temporary unavailability (loading data, waiting for dependencies, warming up caches)

Example Scenario: During deployment, your application needs 30 seconds to populate its cache. The readiness probe ensures traffic only flows after the cache is ready.

3. Startup Probe

Question: "Has the container finished initializing?"

Gives slow-starting containers extra time to start before liveness checks begin
Action: Disables liveness and readiness probes until startup succeeds
Use Case: Applications with long initialization (loading large datasets, database migrations)

Example Scenario: Your ML model takes 2 minutes to load into memory at startup. The startup probe gives it time to initialize before liveness probes start.

Probe Check Methods

Each probe can use one of three check methods:

Method	How It Works	Best For
HTTP GET	Sends HTTP request to a specific path	Web servers, REST APIs
TCP Socket	Attempts TCP connection to a port	Databases, non-HTTP services
Exec Command	Runs a command inside the container	Custom health logic

How Probes Work Together

Container Starts
      ↓
[Startup Probe] ────────> Success ───┐
      ↓                                │
    Failure (restart)                  │
                                       ↓
                        [Liveness Probe] + [Readiness Probe]
                                ↓                    ↓
                        Still alive?         Ready for traffic?
                                ↓                    ↓
                            Yes/No               Yes/No

Key Rules:

Liveness and readiness probes don't run until startup probe succeeds (if configured)
Readiness failures remove pod from load balancer (temporary)
Liveness failures trigger container restart (permanent fix)

CLI
Web UI

Creating Probes with CLI

Use the cod asset probe create command to configure health checks for your assets.

Basic Syntax

cod asset probe create [COMMAND] [FLAGS]

HTTP Probe Example

Configure an HTTP liveness probe that checks /health endpoint:

cod asset probe create \
  --asset my-api \
  --cabinet prod \
  --type liveness \
  --method http \
  --path /health \
  --port 3000 \
  --initial-delay 10 \
  --period 10 \
  --timeout 5 \
  --failure-threshold 3

Expected Outcome: Kubernetes will:

Wait 10 seconds after container starts
Send HTTP GET to http://container:3000/health every 10 seconds
Timeout after 5 seconds if no response
Restart container after 3 consecutive failures

TCP Socket Probe Example

Check if a database port is accepting connections:

cod asset probe create \
  --asset postgres-db \
  --cabinet prod \
  --type readiness \
  --method tcp \
  --port 5432 \
  --initial-delay 5 \
  --period 5

Exec Command Probe Example

Run a custom health check script:

cod asset probe create \
  --asset batch-processor \
  --cabinet prod \
  --type liveness \
  --method exec \
  --command "/app/healthcheck.sh" \
  --initial-delay 30 \
  --period 20 \
  --timeout 10

Script Requirements:

Exit code 0 = healthy
Any other exit code = unhealthy

Probe Configuration Options

Flag	Description	Default	Example
`--asset`	Asset name	Required	`my-api`
`--cabinet`	Cabinet name	Required	`prod`
`--type`	Probe type: `liveness`, `readiness`, `startup`	Required	`liveness`
`--method`	Check method: `http`, `tcp`, `exec`	Required	`http`
`--path`	HTTP path (for HTTP method)	`/`	`/health`
`--port`	Container port to check	Required	`3000`
`--command`	Command to execute (for exec method)	-	`/app/check.sh`
`--initial-delay`	Seconds to wait before first probe	`0`	`10`
`--period`	Seconds between probes	`10`	`15`
`--timeout`	Seconds before probe times out	`1`	`5`
`--success-threshold`	Consecutive successes to mark healthy	`1`	`1`
`--failure-threshold`	Consecutive failures to mark unhealthy	`3`	`3`

Best Practice Examples

Fast-Starting Web Application

# Liveness: Check HTTP endpoint every 10s
cod asset probe create \
  --asset web-app \
  --type liveness \
  --method http \
  --path /healthz \
  --port 8080 \
  --period 10 \
  --failure-threshold 3

# Readiness: Wait for dependencies
cod asset probe create \
  --asset web-app \
  --type readiness \
  --method http \
  --path /ready \
  --port 8080 \
  --initial-delay 5 \
  --period 5

Slow-Starting Database

# Startup: Give it 5 minutes to initialize
cod asset probe create \
  --asset postgres \
  --type startup \
  --method tcp \
  --port 5432 \
  --initial-delay 0 \
  --period 10 \
  --failure-threshold 30  # 30 * 10s = 5 min max startup time

# Liveness: Once started, check every 30s
cod asset probe create \
  --asset postgres \
  --type liveness \
  --method tcp \
  --port 5432 \
  --period 30 \
  --failure-threshold 3

Machine Learning Service

# Startup: Allow 3 minutes for model loading
cod asset probe create \
  --asset ml-inference \
  --type startup \
  --method http \
  --path /startup \
  --port 5000 \
  --period 10 \
  --failure-threshold 18  # 18 * 10s = 3 min max

# Readiness: Check model is loaded
cod asset probe create \
  --asset ml-inference \
  --type readiness \
  --method http \
  --path /ready \
  --port 5000 \
  --period 5

# Liveness: Ensure model server is responsive
cod asset probe create \
  --asset ml-inference \
  --type liveness \
  --method http \
  --path /health \
  --port 5000 \
  --period 15 \
  --timeout 10

Viewing Configured Probes

Check which probes are configured for an asset:

cod asset view my-api --cabinet prod

Expected Output:

Asset: my-api
Cabinet: prod
Probes:
  Liveness: HTTP GET :3000/health (every 10s, threshold: 3)
  Readiness: HTTP GET :3000/ready (every 5s, threshold: 3)

Troubleshooting Probe Failures

Check Pod Events

kubectl get events --field-selector involvedObject.name=my-api-pod

Look for:

Unhealthy - Probe is failing
Killing - Liveness probe triggered restart
Readiness probe failed - Pod removed from service

View Container Logs

cod asset logs my-api --cabinet prod

Common Issues

Problem: Liveness probe fails immediately after deployment

Solution: Increase --initial-delay to give the container time to start

Problem: Readiness probe never succeeds

Solution: Check if your /ready endpoint exists and returns 200 OK

Problem: Container restarts in a loop

Solution: Liveness probe may be too aggressive. Increase --period or --failure-threshold

Creating Probes in Web UI

Configure health probes through Codiac's visual interface.

Step 1: Navigate to Asset Configuration

Open Codiac web UI at https://app.codiac.io
Select your Enterprise
Select the Environment containing your asset
Navigate to the Cabinet
Click on your Asset

Step 2: Add Probe

Click Configure or Probes tab
Click Add Probe button
Select probe type from dropdown:
- Liveness Probe
- Readiness Probe
- Startup Probe

Step 3: Configure Probe Settings

HTTP Probe Configuration

Check Method: Select "HTTP"
Path: Enter endpoint path (e.g., /health, /healthz)
Port: Enter container port (e.g., 3000, 8080)
HTTP Headers: (Optional) Add custom headers if needed

Timing Settings:

Initial Delay: Seconds to wait before first check
Period: Seconds between checks
Timeout: Seconds before request times out
Success Threshold: Consecutive successes needed
Failure Threshold: Consecutive failures before action

TCP Socket Probe Configuration

Check Method: Select "TCP"
Port: Enter port number to check (e.g., 5432 for PostgreSQL)
Configure timing settings (same as HTTP)

Exec Command Probe Configuration

Check Method: Select "Exec"
Command: Enter full command to execute
- Use full paths (e.g., /app/healthcheck.sh)
- Command must exit with code 0 for success
Configure timing settings (same as HTTP)

Step 4: Save and Deploy

Click Save to persist probe configuration
Click Deploy to apply changes to running asset
Monitor deployment status

Expected Outcome:

Probe configuration saved to asset definition
Updated deployment includes health checks
Kubernetes begins monitoring container health

Viewing Probe Status

Asset Health Dashboard

Navigate to your asset in the web UI
View Health Status section
See real-time probe results:
- ✅ Green = Probe passing
- ⚠️ Yellow = Probe warning
- ❌ Red = Probe failing

Probe History

View historical probe failures:

Click History or Events tab
Filter by "Probe Failures"
See timestamps and failure counts

Editing Existing Probes

Navigate to asset configuration
Find probe in Probes section
Click Edit icon
Modify settings
Click Save and Deploy

Removing Probes

Navigate to asset configuration
Find probe in Probes section
Click Delete or Remove icon
Confirm deletion
Deploy changes to update running asset

Probe Best Practices

1. Always Configure Liveness and Readiness

At minimum, every production asset should have:

Liveness probe - Detect and recover from crashes
Readiness probe - Avoid sending traffic to initializing pods

2. Make Probes Lightweight

Probe checks run frequently. Keep them fast and simple:

✅ Good: Simple HTTP endpoint that returns 200 OK
❌ Bad: Full database query or complex computation

3. Use Startup Probes for Slow Initialization

If your application takes >30 seconds to start:

Configure a startup probe with generous failure-threshold
This prevents liveness probe from killing the pod during startup

4. Set Appropriate Timeouts

Too Short: False positives, unnecessary restarts Too Long: Slow failure detection, degraded user experience

Recommended:

Fast HTTP APIs: 5-10 second timeout
Databases: 10-30 second timeout
Complex services: 30-60 second timeout

5. Use Different Endpoints for Liveness vs. Readiness

Liveness: "Am I alive?" (simple, always passes unless broken) Readiness: "Am I ready?" (checks dependencies, may fail temporarily)

Example:

// Liveness: Super simple
app.get('/health', (req, res) => res.status(200).send('OK'));

// Readiness: Check dependencies
app.get('/ready', async (req, res) => {
  const dbOk = await checkDatabase();
  const cacheOk = await checkCache();

  if (dbOk && cacheOk) {
    res.status(200).send('Ready');
  } else {
    res.status(503).send('Not Ready');
  }
});

6. Test Probe Endpoints Locally

Before deploying, verify your health check endpoints work:

# Test HTTP probe endpoint
curl http://localhost:3000/health

# Should return 200 OK

7. Monitor Probe Failures

Track probe failure rates:

Frequent liveness failures = application bugs or resource issues
Frequent readiness failures = dependency problems or slow startup

Common Probe Patterns

API Service

Liveness:  HTTP GET /health every 10s
Readiness: HTTP GET /ready every 5s (checks DB connection)

Database

Startup:   TCP :5432 every 10s, 30 failures allowed (5 min max)
Liveness:  TCP :5432 every 30s
Readiness: TCP :5432 every 10s

Background Worker

Liveness: Exec "/app/check-process.sh" every 30s

Stateless Web App

Liveness:  HTTP GET /healthz every 15s
Readiness: HTTP GET /healthz every 5s (same endpoint, different timing)

Health Check Endpoint Implementation

Node.js / Express

const express = require('express');
const app = express();

app.get('/health', (req, res) => {
  res.status(200).send('OK');
});

app.get('/ready', async (req, res) => {
  try {
    await db.ping();
    res.status(200).send('Ready');
  } catch (error) {
    res.status(503).send('Not Ready');
  }
});

app.listen(3000);

Python / Flask

from flask import Flask
app = Flask(__name__)

@app.route('/health')
def health():
    return 'OK', 200

@app.route('/ready')
def ready():
    try:
        db.ping()
        return 'Ready', 200
    except Exception:
        return 'Not Ready', 503

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=5000)

Go / net/http

package main

import (
    "net/http"
)

func healthHandler(w http.ResponseWriter, r *http.Request) {
    w.WriteHeader(http.StatusOK)
    w.Write([]byte("OK"))
}

func readyHandler(w http.ResponseWriter, r *http.Request) {
    if db.Ping() == nil {
        w.WriteHeader(http.StatusOK)
        w.Write([]byte("Ready"))
    } else {
        w.WriteHeader(http.StatusServiceUnavailable)
        w.Write([]byte("Not Ready"))
    }
}

func main() {
    http.HandleFunc("/health", healthHandler)
    http.HandleFunc("/ready", readyHandler)
    http.ListenAndServe(":8080", nil)
}

Troubleshooting Guide

Symptom: Container restarts repeatedly

Possible Causes:

Liveness probe failing
Initial delay too short
Health endpoint doesn't exist

Solutions:

Check container logs: cod asset logs
Verify health endpoint works
Increase --initial-delay
Increase --failure-threshold

Symptom: No traffic reaching pods

Possible Causes:

Readiness probe failing
Application not fully initialized

Solutions:

Check readiness probe configuration
Verify /ready endpoint returns 200
Check for dependency issues (database, cache, etc.)

Symptom: Probes timeout

Possible Causes:

Application is slow or overloaded
Timeout setting too aggressive

Solutions:

Increase --timeout value
Optimize health check endpoint
Check resource limits (CPU, memory)

FAQ

Q: What's the difference between liveness and readiness probes?

A: Liveness checks if the container is alive (restarts if not). Readiness checks if the container should receive traffic (removes from load balancer if not ready).

Q: How often should probes run?

A: Liveness: Every 10-30 seconds. Readiness: Every 5-10 seconds. Startup: Every 5-10 seconds.

Q: Can I use the same endpoint for liveness and readiness?

A: You can, but it's better to separate them. Readiness should check dependencies; liveness should be simpler.

Q: What happens if all probes fail?

A: Liveness failure → Container restarts. Readiness failure → Pod removed from service. Startup failure → Container eventually restarts.

Q: Do probes cost extra?

A: No, probes are built into Kubernetes and Codiac. No additional cost.

Need help configuring probes? Contact Support or check our troubleshooting guide.

What Are Probes?​

Types of Probes​

1. Liveness Probe​

2. Readiness Probe​

3. Startup Probe​

Probe Check Methods​

How Probes Work Together​

Creating Probes with CLI​

Basic Syntax​

HTTP Probe Example​

TCP Socket Probe Example​

Exec Command Probe Example​

Probe Configuration Options​

Best Practice Examples​

Fast-Starting Web Application​

Slow-Starting Database​

Machine Learning Service​

Viewing Configured Probes​

Troubleshooting Probe Failures​

Check Pod Events​

View Container Logs​

Common Issues​

Creating Probes in Web UI​

Step 1: Navigate to Asset Configuration​

Step 2: Add Probe​

Step 3: Configure Probe Settings​

HTTP Probe Configuration​

TCP Socket Probe Configuration​

Exec Command Probe Configuration​

Step 4: Save and Deploy​

Viewing Probe Status​

Asset Health Dashboard​

Probe History​

Editing Existing Probes​

Removing Probes​

Probe Best Practices​

1. Always Configure Liveness and Readiness​

2. Make Probes Lightweight​

3. Use Startup Probes for Slow Initialization​

4. Set Appropriate Timeouts​

5. Use Different Endpoints for Liveness vs. Readiness​

6. Test Probe Endpoints Locally​

7. Monitor Probe Failures​

Common Probe Patterns​

API Service​

Database​

Background Worker​

Stateless Web App​

Health Check Endpoint Implementation​

Node.js / Express​

Python / Flask​

Go / net/http​

Troubleshooting Guide​

Symptom: Container restarts repeatedly​

Symptom: No traffic reaching pods​

Symptom: Probes timeout​

Related Documentation​

FAQ​

What Are Probes?

Types of Probes

1. Liveness Probe

2. Readiness Probe

3. Startup Probe

Probe Check Methods

How Probes Work Together

Creating Probes with CLI

Basic Syntax

HTTP Probe Example

TCP Socket Probe Example

Exec Command Probe Example

Probe Configuration Options

Best Practice Examples

Fast-Starting Web Application

Slow-Starting Database

Machine Learning Service

Viewing Configured Probes

Troubleshooting Probe Failures

Check Pod Events

View Container Logs

Common Issues

Creating Probes in Web UI

Step 1: Navigate to Asset Configuration

Step 2: Add Probe

Step 3: Configure Probe Settings

HTTP Probe Configuration

TCP Socket Probe Configuration

Exec Command Probe Configuration

Step 4: Save and Deploy

Viewing Probe Status

Asset Health Dashboard

Probe History

Editing Existing Probes

Removing Probes

Probe Best Practices

1. Always Configure Liveness and Readiness

2. Make Probes Lightweight

3. Use Startup Probes for Slow Initialization

4. Set Appropriate Timeouts

5. Use Different Endpoints for Liveness vs. Readiness

6. Test Probe Endpoints Locally

7. Monitor Probe Failures

Common Probe Patterns

API Service

Database

Background Worker

Stateless Web App

Health Check Endpoint Implementation

Node.js / Express

Python / Flask

Go / net/http

Troubleshooting Guide

Symptom: Container restarts repeatedly

Symptom: No traffic reaching pods

Symptom: Probes timeout

Related Documentation

FAQ