Introducing Autopilot, an AI coding assistant
gradient
Troubleshooting SIGTERM: graceful termination of Linux containers (exit code 143)

Troubleshooting SIGTERM: graceful termination of Linux containers (exit code 143)

May 1, 2022
6 min read

SIGTERM is a Linux signal that Unix-based operating systems issue when they want to terminate a running process. In normal circumstances, your application should respond to a SIGTERM by running cleanup procedures that facilitate graceful termination. If processes aren’t ready to terminate, they may elect to ignore or block the signal.

An understanding of SIGTERM can help you work out why containers in Docker and Kubernetes are being stopped. The signal is intended to let your application detect and respond to impending terminations initiated outside the running process. It’ll be issued before the harsher SIGKILL signal, which is used to forcefully terminate a process after it’s ignored a SIGTERM.

In this article, you’ll learn how SIGTERM signals impact your Kubernetes containers, and how you can handle them so you can implement graceful terminations. First, you’ll explore the purpose of SIGTERM and how it’s used by the kernel, giving you a full understanding of what happens when a container is terminated.

What is SIGTERM?

SIGTERM (signal 15) is the Unix process termination signal. It’s the first method the kernel uses when it needs to kill a process. That might be in response to a user request, such as the kill command, or external pressures, like a low memory situation.

SIGTERM’s purpose is to let processes gracefully terminate of their own accord, instead of being forcefully killed by the kernel. A forced termination can lead to data corruption, so it’s used as the option of last resort. Applications with long-lived database connections, open file handles, or active network communications can intercept SIGTERM signals to finish their current activity and safely close stateful resources.

A process that exits after a SIGTERM will emit the status code 143. This is also what you’ll see in Docker and Kubernetes when a container is terminated due to the SIGTERM signal.

Issuing a SIGTERM

To see SIGTERM in action, open two terminals. In the first terminal, run sleep to create a long-running command:

c++

This will block the terminal window while the command runs for five minutes. Switch to your second terminal, and run ps -aux to discover the process ID (PID) of the sleep command:

c++

In this example, we can see the sleep command is executing as PID 3074856.

Pass the PID to the kill command to issue a SIGTERM to the process. Despite its name, kill issues a SIGTERM by default, allowing the process to stop gracefully.

c++

In your first terminal window, where you ran the sleep command, you should see the process terminate and drop back to the shell:

c++

Inspecting the exit code in this terminal will reveal it to be 143, indicating the process in response to a SIGTERM signal.

c++

What about SIGKILL?

SIGKILL (signal 9, exit code 137) is issued later in the process termination sequence. While SIGTERM can be seen as a “please stop when possible,” SIGKILL is an urgent “stop now.”

Processes aren’t able to handle, block, or ignore SIGKILL. Once one’s been issued, the process will be terminated by the kernel. It has almost immediate effect, killing the process without giving it a chance to delay.

You can issue a SIGKILL using kill with the -9 flag. This instructs the command to send a SIGKILL instead of SIGTERM. Create two new terminal windows to repeat the example from above. Run sleep 300 in the first and then kill the process using the second window:

c++

The process is PID 3084295; now it can be sent a SIGKILL with kill -9:

c++

Back in the first window, you’ll see sleep exit and a Killed message appear in the terminal output:

c++

Retrieving the exit code will confirm it’s 137, meaning a SIGKILL signal was received.

c++

The importance of understanding SIGTERM

Handling SIGTERM ensures your applications terminate properly without risking data corruption. Understanding the differences between SIGTERM and SIGKILL can also help you identify the reason why a process has been stopped. If it received a SIGTERM, it’s an indication you could have reacted to the signal to prevent a bad state occurring.

On the other hand, applications that are receiving SIGKILL signals can indicate there’s bigger problems with your environment. The kernel should only issue a SIGKILL when it needs to immediately cull its process list. This is normally due to the out-of-memory (OOM) killer intervening to prevent RAM exhaustion. Regular unexpected SIGKILLs should be investigated by checking if your host has enough physical memory to reliably support its workloads.

SIGTERM in Kubernetes

The Kubernetes pod termination process is based on the SIGTERM and SIGKILL mechanism. When a pod is terminated, the foreground processes of its containers will receive a SIGTERM signal. The containers continue running, offering them an opportunity to gracefully terminate.

If the container hasn’t stopped thirty seconds after the SIGTERM was sent, Kubernetes gives up waiting and uses SIGKILL to forcefully terminate it. The SIGKILL stops all the running processes associated with the pod’s containers. Kubernetes also removes the pod itself and any other related objects. The thirty second delay is configurable; you’ll see how to change this below.

All this occurs each time you delete a Kubernetes pod. To issue a SIGTERM to a pod, simply use the kubectl delete command. This always defaults to graceful termination.

bash

The command returns once all the containers in the pod have actually terminated. If there’s a delay after running the command, it’s often because one or more of the container processes are handling the SIGTERM signal to gracefully terminate themselves. (Object finalizers are the other common source of delays when removing pods.)

kubectl does have a way to force an immediate deletion. Adding the --force flag to a kubectl delete command will send a SIGKILL to the container processes immediately after the SIGTERM is issued. This permits instantaneous deletion of the pod.

bash

Changing the grace period

As mentioned above, Kubernetes defaults to allowing thirty seconds for container SIGTERM handlers to complete. When that time expires, a SIGKILL will be issued to force the container’s termination.

This value can be changed by setting the spec.terminationGracePeriodSeconds field on your pods. It defines the maximum time Kubernetes will wait after issuing a SIGTERM to terminate a container within the pod.

yaml

Applying this pod to your cluster (kubectl apply -f pod.yaml) allows its containers a longer period in which they can gracefully terminate. If a container used all the available time, a kubectl delete pod my-pod command would seem to hang for ninety seconds before a SIGKILL is issued.

It’s important to note that it’s still possible for a container to be killed immediately, without receiving a SIGTERM or a corresponding grace period. Evictions due to Kubernetes detecting an out-of-memory (OOM) condition occur instantly with an unavoidable SIGKILL, just like standard Linux processes.

Handling SIGTERM in your code

Now that you know what SIGTERM does and when it’s used, it’s a good idea to modify your own applications to properly support it. Each programming language provides its own mechanism to listen for and handle operating system signals. Attaching a handler to SIGTERM will let you run code just before the process terminates.

Here’s a simple example in Python:

python

The code configures a signal handler that calls the handle_sigterm() function when a SIGTERM is received.

Here’s the same code implemented in Node.js:

python

A SIGTERM handler makes sense in any program which could be interrupted during a long-lived operation that needs to run to completion.

Final thoughts

Linux sends SIGTERM signals to processes when they’re about to be terminated. The process can handle the signal to implement graceful cleanup procedures, such as ending network activity and closing file handlers. It should then exit in a timely manner to fulfill the termination request. A process that doesn’t terminate in response to a SIGTERM may be forcefully killed by a later SIGKILL signal.

Kubernetes uses SIGTERM and SIGKILL within its own container termination process. Deleting a pod first issues a SIGTERM to the pod’s containers, providing time to clean up that matches the configured grace period. Containers that don’t quit in time will receive a SIGKILL that enacts an instant termination.

If you're looking to build a dashboard or pipeline to help troubleshoot and fix errors like this, consider using Airplane. Airplane is the developer platform for building internal tools. The basic building blocks of Airplane are Tasks, which are single or multi-step functions that anyone on your team can use. Airplane also offers Views, a React-based platform for quickly building internal UIs. With Airplane, you can easily transform Python scripts, SQL queries, APIs, and more into these powerful internal tools for your engineering workflows.

Sign up for a free account or book a demo to build powerful internal tools within minutes.

Share this article:
James Walker
James Walker is the founder of Heron Web, a UK-based digital agency providing bespoke software development services to SMEs.

Subscribe to new blog posts from Airplane.