Kubernetes probes are a mechanism for providing the Kubernetes control plane with information about the internal state of your applications. They let your cluster identify running pods that are in an unhealthy state.
Startup probes detect when a container’s workload has launched and the container is ready to use. Kubernetes relies on this information when determining whether a container can be targeted by liveness and readiness probes.
It’s important to add startup probes in conjunction with these other probe types. Otherwise, a container could be targeted by a liveness or readiness probe before it’s able to handle the request. This would cause the container to be restarted and flagged as unhealthy, when it was actually healthy but still initializing.
In this article, you’ll learn how to use a startup probe to prevent this scenario occurring. We’ll also cover some of the common pitfalls and sticking points associated with these probes.
Kubernetes probe types
Kubernetes has three basic probe types:
- Liveness probes: Liveness probes detect whether a pod is healthy by running a command or making a network request inside the container. Containers that fail the check are restarted.
- Readiness probes: Readiness probes identify when a container is able to handle external traffic received from a service. Containers don’t become part of their services until they pass a readiness probe.
- Startup probes: Startup probes provide a way to defer the execution of liveness and readiness probes until a container indicates it’s able to handle them. Kubernetes won’t direct the other probe types to a container if it has a startup probe that hasn’t yet succeeded.
In this article, you’ll be focusing on startup probes. As their role is to prevent other probes from running, you’ll always be using them alongside liveness and readiness probes. A startup probe doesn’t alter your workload’s behavior on its own.
Startup probes should be used when the application in your container could take a significant amount of time to reach its normal operating state. Applications that would crash or throw an error if they handled a liveness or readiness probe during startup need to be protected by a startup probe. This ensures the container doesn’t enter a restart loop due to failing healthiness checks before it’s finished launching.
Configuring startup probes
Startup probes support the four basic Kubernetes probing mechanisms:
- Exec: Executes a command within the container. The probe succeeds if the command exits with a 0 code.
- HTTP: Makes an HTTP call to a URL within the container. The probe succeeds if the container issues an HTTP response in the 200-399 range.
- TCP: The probe succeeds if a specific container port is accepting traffic.
- gRPC: Makes a gRPC health checking request to a port inside the container and uses its result to determine whether the probe succeeded.
All these mechanisms share some basic parameters that control the probe’s success criteria and how frequently it’s checked:
initialDelaySeconds
: Set a delay between the time the container starts and the first time the probe is executed. Defaults to zero seconds.periodSeconds
: Defines how frequently the probe will be executed after the initial delay. Defaults to ten seconds.timeoutSeconds
: Each probe will time out and be marked as failed after this many seconds. Defaults to one second.failureThreshold
: Instructs Kubernetes to retry the probe this many times after a failure is first recorded. The container will only be restarted if the retries also fail. Defaults to three.
Effective configuration of a startup probe relies on these values being set correctly.
Creating a startup probe
Startup probes are created by adding a startupProbe
field within the spec.containers
portion of a pod’s manifest. Here’s a simple example of a startup probe using the exec
mechanism. It runs a command inside the container:
Add the pod to your cluster using kubectl:
The container will start and run normally. You can verify this by viewing its details in kubectl:
$ kubectl describe pod startup-probe-demo
<omitted>
Events:
TYPE | REASON | AGE | FROM | MESSAGE |
---|---|---|---|---|
---- | ------ | --- | ---- | ------ |
Normal | Scheduled | 9s | default-scheduler | Successfully assigned default/startup-probe-demo to default |
Normal | Pulling | 8s | kubelet | Pulling image "busybox:latest" |
Normal | Pulled | 7s | kubelet | Successfully pulled image "busybox:latest" in 860.669288ms |
Normal | Created | 7s | kubelet | Created container startup-probe-demo |
Normal | Started | 7s | kubelet | Started container startup-probe-demo |
The probe in the example above uses the presence of the /etc/hostname
file to determine whether the container has started. As this file exists inside the container, the startup probe will succeed without logging any events.
The values of periodSeconds
and failureThreshold
need to be adjusted to suit your own application. Together, they should cover the container’s maximum permitted startup time. In the example above, a periodSeconds
of 10
and a failureThreshold
of 10
means the container will have up to a hundred seconds in which to start—up to ten checks with ten seconds between them. The container will be restarted if the probe still doesn’t succeed after this time.
You can use the other config parameters to further tune your probe. If you know a container has a minimum startup time, setting initialDelaySeconds
will prevent it from being probed immediately after creation, when you know the check will fail.
Adjusting and troubleshooting probes
Here’s an example of a pod with a startup probe that will fail:
In this case, the probe looks at /etc/foobar
, which doesn’t exist in the container. The probe will run every ten seconds, as specified by the value of periodSeconds
. Up to ten attempts will be made, as allowed by failureThreshold
. If the container creates /etc/foobar
before the last attempt, the probe will succeed, and Kubernetes will begin to direct liveness and readiness probes to the container. Otherwise, the startup probe will be marked as failed, and the container will be killed.
You can inspect failing startup probes by retrieving the pod’s events with kubectl:
$ kubectl describe pod startup-probe-demo
<omitted>
Events:
TYPE | REASON | AGE | FROM | MESSAGE |
---|---|---|---|---|
---- | ----- | --- | ---- | ------- |
Normal | Scheduled | 2m42s | default-scheduler | Successfully assigned default/startup-probe-demo to default |
Normal | Pulling | 2m41s | kubelet | Pulling image "busybox:latest" |
Normal | Pulled | 2m40s | kubelet | Successfully pulled image "busybox:latest" in 860.669288ms |
Normal | Created | 2m40s | kubelet | Created container startup-probe-demo |
Normal | Started | 2m40s | kubelet | Started container startu-probe-demo |
Warning | Unhealthy | 61s (x10 over 2m31s) | kubelet | Startup probe failed: cat: can't open '/etc/foobar': No such file or directory |
Normal | Pulling | 60s | kubelet | Pulling image "busybox:latest" |
Normal | Killing | 59s | kubelet | Container startup-probe-demo failed startup probe, will be restarted |
This event log shows that the startup probe failed because of the missing /etc/foobar
file. After ten attempts, the container’s status changed to Killing
, and a restart was scheduled. Looking for failed startup probe
lines in your pod’s logs will help you find containers that have been restarted for this reason.
HTTP probes
HTTP probes are created in a similar manner to exec commands. They’re considered failed when the issued response lies outside the 200-399 status range. Nest an httpGet
field instead of exec
in your startupProbe
definition:
The startupProbe.httpGet
field supports optional host
, scheme
, path
, and httpHeaders
fields to customize the request that’s made. The host
defaults to the pod’s internal IP address; the default scheme is http
. The following pod manifest includes a startup probe that makes an HTTPS request with a custom header:
Apply the pod to your cluster with kubectl:
Now retrieve the pod’s events to check whether the probe’s succeeded:
$ kubectl describe pod startup-probe-demo
<omitted>
Events:
TYPE | REASON | AGE | FROM | MESSAGE |
---|---|---|---|---|
---- | ----- | --- | ---- | ------- |
Normal | Scheduled | 12s | default-scheduler | Successfully assigned default/startup-probe-demo to default |
Normal | Pulling | 11s | kubelet | Pulling image "nginx:latest" |
Normal | Pulled | 10s | kubelet | Successfully pulled image "nginx:latest" in 797.884311ms |
Normal | Created | 10s | kubelet | Created container startup-probe-demo |
Normal | Started | 10s | kubelet | Started container startu-probe-demo |
Warning | Unhealthy | 8s | kubelet | Startup probe failed: Get "https://10.244.0.163/": http: server gave HTTP response to HTTPS client |
This example leaves the pod in an unhealthy state because the startup probe fails. The NGINX image is not configured to support HTTPS by default, so the probe received an invalid response.
TCP probes
TCP probes try to open a socket to your container on a specified port. Add a tcpSocket.port
field to your startupProbe
configuration to use this probe type:
The probe will be considered failed if the socket can’t be opened.
gRPC probes
gRPC probes are available with Kubernetes v1.23 when the GRPCContainerProbe
feature gate is enabled. Add a grpc.port
field to your pod’s startupProbe
to define where health checks should be directed to:
The etcd
container image is used here as an example of a gRPC-compatible service. Kubernetes will send gRPC health check requests to port 2379 in the container. The startup probe will be marked as failed if the container issues an unhealthy response.
Common problems
Misconfigured startup probes can easily lead to restart loops. You must pay attention to your probe’s configuration to make sure it’s suited to your application.
If your container takes longer to start than the window offered by the probe’s periodSeconds
and failureThreshold
, it’ll be restarted before the probe completes. The replacement container won’t start in time either, creating an endless loop of restarts that prevents your workload from becoming operational. You should measure your application’s typical startup time and use that to determine your periodSeconds
, failureThreshold
, and initialDelaySeconds
values.
Conversely, another common issue is startup probes that are too conservative, leading to excessive delays in new containers becoming available. You can avoid this by using a short periodSeconds
in conjunction with a very high failureThreshold
. This will let Kubernetes rapidly poll your container’s status, ensuring its startup is noticed with minimal delay while avoiding premature failure due to the threshold being reached.
Should startup probes match liveness/readiness probes?
It’s often effective to configure startup probes with the same command or HTTP request as your liveness and readiness probes. By using this technique, you can guarantee that liveness and readiness probes will succeed once Kubernetes begins directing them to the container.
Depending on your application’s implementation, using a different command or request could create a situation where the startup probe succeeds, but subsequent probes still can’t be handled correctly. This can be confusing to debug. Mirroring liveness and readiness probe actions in your startup probe helps ensure reliability; failures in the action during the startup phase won’t have any negative effects, provided a success occurs before the startup probe’s failureThreshold
is reached.
Final thoughts
Startup probes let your containers inform Kubernetes when they’ve started up and are ready to be assessed for liveness and readiness. It’s good practice to add a startup probe wherever you’re using liveness and readiness probes, as otherwise, containers may get restarted before they’ve finished initializing. In this guide, we’ve explored the use cases for startup probes and shown how you can create and troubleshoot them.
If you're looking for an easy way to monitor the internal state of your applications, consider using Airplane. With Airplane Views, you can create custom dashboards that help you track, monitor and notify internal users when issues arise. You can also use Airplane Tasks to build single or multi-step operations to manage and update your applications.
Airplane also offers strong built-ins, such as job scheduling, permissions setting, and audit logs, that make it easy to ensure the safety and security of your internal operations.
To try it out and build your first internal workflow within minutes, sign up for a free account or book a demo.