Post-Deploy Jobs Against Live Services

When an onPostDeploy job needs to connect to services the same asset just deployed — a database migration runner, a data seeder, an integration smoke test — timing matters. Codiac fires onPostDeploy as soon as deployment manifests are applied, not after the new service instances are up and accepting traffic.

The solution is to build the wait into the job itself using a startup gate: a short-lived container that runs before your main job container and exits only when the target service is ready. Your job never starts until the gate passes, and Codiac's deployment pipeline moves on without blocking.

How it works

Add an initContainers entry to your job spec. Each init container must exit 0 before the next one runs, and before the main container starts. If an init container fails, it is retried automatically according to the job's restartPolicy. No changes to your Codiac configuration are needed — the gate lives entirely in the job manifest.

Option 1: TCP port check

Best for any service that accepts TCP connections: databases, message queues, APIs.

spec:
  template:
    spec:
      initContainers:
        - name: wait-for-service
          image: busybox
          command:
            - sh
            - -c
            - until nc -z my-service 5432; do echo "waiting..."; sleep 2; done
      containers:
        - name: migration-runner
          image: my-migration-image

Replace my-service with the service name and 5432 with the port. The loop retries every two seconds until the TCP port responds, then exits 0 and lets the main container start.

Limitation: Confirms the port is open, not that the application is ready to serve requests.

Option 2: HTTP health check

Best for services with a /health or /ready endpoint.

spec:
  template:
    spec:
      initContainers:
        - name: wait-for-api
          image: busybox
          command:
            - sh
            - -c
            - until wget -qO- http://my-service:8080/health; do echo "waiting..."; sleep 2; done
      containers:
        - name: migration-runner
          image: my-migration-image

wget exits non-zero on a connection failure or a non-2xx response, so the loop retries until the application confirms it is ready at the HTTP layer.

Limitation: Requires the service to expose a health endpoint.

Option 3: Full rollout check

Best when you need to confirm every instance of the service is healthy — not just the first one that answered.

spec:
  template:
    spec:
      serviceAccountName: post-deploy-runner
      initContainers:
        - name: wait-for-rollout
          image: bitnami/kubectl
          command:
            - kubectl
            - wait
            - --for=condition=available
            - deployment/my-service
            - --timeout=300s
      containers:
        - name: migration-runner
          image: my-migration-image

The init container waits up to 5 minutes for the named service deployment to report all instances healthy before exiting.

Setup required: The job's service account (post-deploy-runner in the example) needs get and watch permissions on deployments in the same cabinet. Create a Role granting those permissions and bind it to the service account.

Limitation: More setup overhead than the other options; requires a service account with read access to deployment state.

Option 4: Retry on failure

Best when the service starts within seconds and an initial failure is acceptable.

Instead of a startup gate, configure the job to retry on failure:

spec:
  backoffLimit: 5
  template:
    spec:
      restartPolicy: OnFailure
      containers:
        - name: migration-runner
          image: my-migration-image

If the job fails because the service is not yet ready, it retries with exponential back-off up to backoffLimit times.

Limitation: The job will fail at least once before succeeding. If you have alerting on job failures, that alert will fire on every deploy. Prefer one of the startup gate approaches if clean deploy logs matter to you.

Which to use

Scenario	Approach
Any TCP service (database, broker, API)	TCP check
Service exposes a health endpoint	HTTP health check
Need all instances confirmed healthy before proceeding	Full rollout check
Service starts fast; initial failure is acceptable	Retry

For most migration runners and seed loaders, Option 2 (HTTP health check) is the right default — straightforward to configure and gives a meaningful application-layer signal.

How it works​

Option 1: TCP port check​

Option 2: HTTP health check​

Option 3: Full rollout check​

Option 4: Retry on failure​

Which to use​