Introducing Autopilot, an AI coding assistant
gradient
Kubernetes startup probe - a practical guide

Kubernetes startup probe - a practical guide

Apr 18, 2022
8 min read

Kubernetes probes are a mechanism for providing the Kubernetes control plane with information about the internal state of your applications. They let your cluster identify running pods that are in an unhealthy state.

Startup probes detect when a container’s workload has launched and the container is ready to use. Kubernetes relies on this information when determining whether a container can be targeted by liveness and readiness probes.

It’s important to add startup probes in conjunction with these other probe types. Otherwise, a container could be targeted by a liveness or readiness probe before it’s able to handle the request. This would cause the container to be restarted and flagged as unhealthy, when it was actually healthy but still initializing.

In this article, you’ll learn how to use a startup probe to prevent this scenario occurring. We’ll also cover some of the common pitfalls and sticking points associated with these probes.

Kubernetes probe types

Kubernetes has three basic probe types:

  • Liveness probes: Liveness probes detect whether a pod is healthy by running a command or making a network request inside the container. Containers that fail the check are restarted.
  • Readiness probes: Readiness probes identify when a container is able to handle external traffic received from a service. Containers don’t become part of their services until they pass a readiness probe.
  • Startup probes: Startup probes provide a way to defer the execution of liveness and readiness probes until a container indicates it’s able to handle them. Kubernetes won’t direct the other probe types to a container if it has a startup probe that hasn’t yet succeeded.

In this article, you’ll be focusing on startup probes. As their role is to prevent other probes from running, you’ll always be using them alongside liveness and readiness probes. A startup probe doesn’t alter your workload’s behavior on its own.

Startup probes should be used when the application in your container could take a significant amount of time to reach its normal operating state. Applications that would crash or throw an error if they handled a liveness or readiness probe during startup need to be protected by a startup probe. This ensures the container doesn’t enter a restart loop due to failing healthiness checks before it’s finished launching.

Configuring startup probes

Startup probes support the four basic Kubernetes probing mechanisms:

  • Exec: Executes a command within the container. The probe succeeds if the command exits with a 0 code.
  • HTTP: Makes an HTTP call to a URL within the container. The probe succeeds if the container issues an HTTP response in the 200-399 range.
  • TCP: The probe succeeds if a specific container port is accepting traffic.
  • gRPC: Makes a gRPC health checking request to a port inside the container and uses its result to determine whether the probe succeeded.

All these mechanisms share some basic parameters that control the probe’s success criteria and how frequently it’s checked:

  • initialDelaySeconds: Set a delay between the time the container starts and the first time the probe is executed. Defaults to zero seconds.
  • periodSeconds: Defines how frequently the probe will be executed after the initial delay. Defaults to ten seconds.
  • timeoutSeconds: Each probe will time out and be marked as failed after this many seconds. Defaults to one second.
  • failureThreshold: Instructs Kubernetes to retry the probe this many times after a failure is first recorded. The container will only be restarted if the retries also fail. Defaults to three.

Effective configuration of a startup probe relies on these values being set correctly.

Creating a startup probe

Startup probes are created by adding a startupProbe field within the spec.containers portion of a pod’s manifest. Here’s a simple example of a startup probe using the exec mechanism. It runs a command inside the container:

yaml

Add the pod to your cluster using kubectl:

bash

The container will start and run normally. You can verify this by viewing its details in kubectl:

$ kubectl describe pod startup-probe-demo

<omitted>

Events:

TYPEREASONAGEFROMMESSAGE
-----------------------
NormalScheduled9sdefault-schedulerSuccessfully assigned default/startup-probe-demo to default
NormalPulling8skubeletPulling image "busybox:latest"
NormalPulled7skubeletSuccessfully pulled image "busybox:latest" in 860.669288ms
NormalCreated7skubeletCreated container startup-probe-demo
NormalStarted7skubeletStarted container startup-probe-demo

The probe in the example above uses the presence of the /etc/hostname file to determine whether the container has started. As this file exists inside the container, the startup probe will succeed without logging any events.

The values of periodSeconds and failureThreshold need to be adjusted to suit your own application. Together, they should cover the container’s maximum permitted startup time. In the example above, a periodSeconds of 10 and a failureThreshold of 10 means the container will have up to a hundred seconds in which to start—up to ten checks with ten seconds between them. The container will be restarted if the probe still doesn’t succeed after this time.

You can use the other config parameters to further tune your probe. If you know a container has a minimum startup time, setting initialDelaySeconds will prevent it from being probed immediately after creation, when you know the check will fail.

Adjusting and troubleshooting probes

Here’s an example of a pod with a startup probe that will fail:

yaml

In this case, the probe looks at /etc/foobar, which doesn’t exist in the container. The probe will run every ten seconds, as specified by the value of periodSeconds. Up to ten attempts will be made, as allowed by failureThreshold. If the container creates /etc/foobar before the last attempt, the probe will succeed, and Kubernetes will begin to direct liveness and readiness probes to the container. Otherwise, the startup probe will be marked as failed, and the container will be killed.

You can inspect failing startup probes by retrieving the pod’s events with kubectl:

$ kubectl describe pod startup-probe-demo

<omitted>

Events:

TYPEREASONAGEFROMMESSAGE
-----------------------
NormalScheduled2m42sdefault-schedulerSuccessfully assigned default/startup-probe-demo to default
NormalPulling2m41skubeletPulling image "busybox:latest"
NormalPulled2m40skubeletSuccessfully pulled image "busybox:latest" in 860.669288ms
NormalCreated2m40skubeletCreated container startup-probe-demo
NormalStarted2m40skubeletStarted container startu-probe-demo
WarningUnhealthy61s (x10 over 2m31s)kubeletStartup probe failed: cat: can't open '/etc/foobar': No such file or directory
NormalPulling60skubeletPulling image "busybox:latest"
NormalKilling59skubeletContainer startup-probe-demo failed startup probe, will be restarted

This event log shows that the startup probe failed because of the missing /etc/foobar file. After ten attempts, the container’s status changed to Killing, and a restart was scheduled. Looking for failed startup probe lines in your pod’s logs will help you find containers that have been restarted for this reason.

HTTP probes

HTTP probes are created in a similar manner to exec commands. They’re considered failed when the issued response lies outside the 200-399 status range. Nest an httpGet field instead of exec in your startupProbe definition:

yaml

The startupProbe.httpGet field supports optional host, scheme, path, and httpHeaders fields to customize the request that’s made. The host defaults to the pod’s internal IP address; the default scheme is http. The following pod manifest includes a startup probe that makes an HTTPS request with a custom header:

yaml

Apply the pod to your cluster with kubectl:

bash

Now retrieve the pod’s events to check whether the probe’s succeeded:

$ kubectl describe pod startup-probe-demo

<omitted>

Events:

TYPEREASONAGEFROMMESSAGE
-----------------------
NormalScheduled12sdefault-schedulerSuccessfully assigned default/startup-probe-demo to default
NormalPulling11skubeletPulling image "nginx:latest"
NormalPulled10skubeletSuccessfully pulled image "nginx:latest" in 797.884311ms
NormalCreated10skubeletCreated container startup-probe-demo
NormalStarted10skubeletStarted container startu-probe-demo
WarningUnhealthy8skubeletStartup probe failed: Get "https://10.244.0.163/": http: server gave HTTP response to HTTPS client

This example leaves the pod in an unhealthy state because the startup probe fails. The NGINX image is not configured to support HTTPS by default, so the probe received an invalid response.

TCP probes

TCP probes try to open a socket to your container on a specified port. Add a tcpSocket.port field to your startupProbe configuration to use this probe type:

yaml

The probe will be considered failed if the socket can’t be opened.

gRPC probes

gRPC probes are available with Kubernetes v1.23 when the GRPCContainerProbe feature gate is enabled. Add a grpc.port field to your pod’s startupProbe to define where health checks should be directed to:

yaml

The etcd container image is used here as an example of a gRPC-compatible service. Kubernetes will send gRPC health check requests to port 2379 in the container. The startup probe will be marked as failed if the container issues an unhealthy response.

Common problems

Misconfigured startup probes can easily lead to restart loops. You must pay attention to your probe’s configuration to make sure it’s suited to your application.

If your container takes longer to start than the window offered by the probe’s periodSeconds and failureThreshold, it’ll be restarted before the probe completes. The replacement container won’t start in time either, creating an endless loop of restarts that prevents your workload from becoming operational. You should measure your application’s typical startup time and use that to determine your periodSeconds, failureThreshold, and initialDelaySeconds values.

Conversely, another common issue is startup probes that are too conservative, leading to excessive delays in new containers becoming available. You can avoid this by using a short periodSeconds in conjunction with a very high failureThreshold. This will let Kubernetes rapidly poll your container’s status, ensuring its startup is noticed with minimal delay while avoiding premature failure due to the threshold being reached.

Should startup probes match liveness/readiness probes?

It’s often effective to configure startup probes with the same command or HTTP request as your liveness and readiness probes. By using this technique, you can guarantee that liveness and readiness probes will succeed once Kubernetes begins directing them to the container.

Depending on your application’s implementation, using a different command or request could create a situation where the startup probe succeeds, but subsequent probes still can’t be handled correctly. This can be confusing to debug. Mirroring liveness and readiness probe actions in your startup probe helps ensure reliability; failures in the action during the startup phase won’t have any negative effects, provided a success occurs before the startup probe’s failureThreshold is reached.

Final thoughts

Startup probes let your containers inform Kubernetes when they’ve started up and are ready to be assessed for liveness and readiness. It’s good practice to add a startup probe wherever you’re using liveness and readiness probes, as otherwise, containers may get restarted before they’ve finished initializing. In this guide, we’ve explored the use cases for startup probes and shown how you can create and troubleshoot them.

If you're looking for an easy way to monitor the internal state of your applications, consider using Airplane. With Airplane Views, you can create custom dashboards that help you track, monitor and notify internal users when issues arise. You can also use Airplane Tasks to build single or multi-step operations to manage and update your applications.

Airplane also offers strong built-ins, such as job scheduling, permissions setting, and audit logs, that make it easy to ensure the safety and security of your internal operations.

To try it out and build your first internal workflow within minutes, sign up for a free account or book a demo.

Share this article:
James Walker
James Walker is the founder of Heron Web, a UK-based digital agency providing bespoke software development services to SMEs.

Subscribe to new blog posts from Airplane.