Introducing Autopilot, an AI coding assistant
gradient
Troubleshooting SIGSEGV: segmentation fault in Linux containers (exit code 139)

Troubleshooting SIGSEGV: segmentation fault in Linux containers (exit code 139)

Apr 13, 2022
6 min read

The SIGSEGV Linux signal denotes a segmentation violation within a running process. Segmentation errors occur when a program tries to access memory that hasn’t been allocated. This could be due to accidentally buggy code or intentional malicious activity.

SIGSEGV signals arise at the operating system level, but you’ll also encounter them in the context of containerization technologies like Docker and Kubernetes. When a container exits with status code 139, it’s because it received a SIGSEGV signal. The operating system terminated the container’s process to guard against a memory integrity violation.

It’s important to investigate what’s causing the segmentation errors if your containers are terminating with code 139. It often points to a programming error in languages which gives you direct access to memory. If the error occurs in containers running a third-party image, there could be a bug inside that software or an incompatibility with your environment.

In this article, we’ll explain what SIGSEGV signals are, their impact on your Linux containers in Kubernetes, and the ways you can troubleshoot and handle segmentation faults in your application.

What’s a segmentation fault?

A segmentation fault can seem quite an opaque term. The meaning is quite simple: a process that receives a SIGSEGV signal tried to read or write memory it’s not allowed to access. The kernel will normally terminate the process to avoid memory corruption. This behavior can be modified by explicitly handling the signal in the program’s code.

Segmentation faults are named to reflect the way in which memory is partitioned by purpose. Data segments store values that can be determined at compile time, text segments hold program instructions, and heap segments encapsulate dynamically allocated variables created at runtime.

Most real-world segmentation faults fall into the last category. Operations such as improper pointer definitions, writes to read-only memory, and out-of-bounds array accesses all try to access memory that’s outside the heap.

Here’s a trivial example of a C program that exhibits a segmentation error:

c

Save the program as hello-world.c and compile it with make:

c

Now run the compiled binary:

c

You’ll see the program immediately terminates, and a segmentation fault is reported. If you inspect the exit code, you’ll see it’s 139, corresponding to a segmentation error:

c

Why did this happen? The program created a variable called buffer, but didn’t allocate it any memory. As a result, the assignment buffer[0] = 0 ended up writing to unallocated memory. You can fix the program by making sure buffer is large enough to cover the data it’ll store:

c

Allocating buffer one byte of memory is sufficient to handle the assigned value. This program will run successfully and exit with status code 0.

Segmentation faults in containers

Now let’s look at what happens when a segmentation fault occurs within a container. Here’s a simple Dockerfile for the crashing application written above:

c

Build your container image with the following command:

c

Now start a container:

c

The container will start, run the command, and terminate immediately. Use docker ps with the -a flag to retrieve the stopped container’s details:

$ docker ps -a

CONTAINER IDIMAGECOMMANDCREATEDSTATUS
6e6944f7f339segfault:latest"hello-world"17 seconds agoExited (139) 16 seconds ago

Exit code 139 is reported because of the segmentation error in the application.

Debugging Kubernetes segmentation errors

You can troubleshoot segmentation faults in Kubernetes containers, too. Use a project such as MicroK8s or K3s to start a local Kubernetes cluster on your machine. Next, create a pod manifest that starts a container using your image:

yaml

Use kubectl to add the pod to your cluster:

bash

Now retrieve the pod’s details:

$ kubectl get pod/segfault

NAMEREADYSTATUSRESTARTSAGE
segfault0/1CrashLoopBackOff1 (7s ago)19s

The pod is stuck crashing in a restart loop. Use the describe command to find out the cause:

bash

The exit code is reported as 139, indicating that a segmentation error caused the application inside the container to crash.

Solving segmentation faults

Once you’ve identified segmentation errors as the cause of your container terminations, you can move on to mitigating them and preventing future recurrences.

If the error’s occurring inside a third-party container image, you will have limited options. You should raise an issue with the developer to investigate the cause of the unexpected memory access attempts. When the problem’s inside your own software, you can start more targeted troubleshooting efforts to work out what’s wrong.

Identifying problem code

First, look for any obvious areas of your code that could be impacted by segmentation issues. You might be able to use your container’s logs to work out the sequence of events leading up to the error:

bash

Use the container’s activity to work out where in the source the error originates. If there’s an array access, pointer reference, or unguarded memory write in the area, it could be the cause of the problem.

Environment incompatibilities

Another common cause of these errors is when an update to a shared library introduces incompatibilities with existing binaries. This can cause memory access violations when the loaded versions differ from the compatible range.

Try to revert any recent changes to the dependencies inside your containers. This can help eliminate issues that have been provoked by third-party library updates.

In rare cases, persistent segmentation faults with no obvious explanation can be caused by incompatibilities with the machine’s physical hardware. They might even be symptomatic of a memory fault. This kind of issue is less likely in the context of a typical Kubernetes cluster running on a public K8s cloud provider. Running memtester can help you rule out physical problems when you’re maintaining your own hardware.

Targeted debugging

You can use Linux tools to more precisely debug SIGSEGV signals. Segmentation fault errors always create kernel log messages. As containers execute as processes within your host’s kernel, these will be written even if the error occurred inside a container.

Inspect your system log by viewing the contents of /var/log/syslog:

bash

This command will continually stream logs to your terminal until you use Ctrl+C to cancel it. Now, try to reproduce the event that caused the segmentation error. The SIGSEGV signal will look like this in the log:

shell

The log can be interpreted as follows:

  • at <address>: The forbidden memory address that the code tried to access.
  • ip <pointer>: The memory address of the code that committed the violation.
  • sp <pointer>: The stack pointer for the operation, giving the address of the last program request in the stack.
  • error <code>: The error code gives an indication of the type of operation that was attempted. Common codes include 6, writing to an unallocated area; 7, writing to an area that is readable but can’t be written to; 4, reading from an unallocated area; and 5, reading from a write-only area.

Accessing the kernel log gives you a better understanding of what the code’s doing at the point the error occurs. Although this log isn’t directly accessible from within containers, you should still be able to retrieve details of segmentation faults if you have root access to the host machine.

Gracefully handling segmentation faults

Another way to resolve segmentation faults is to gracefully handle them inside your code. You can use libraries like segvcatch to capture SIGSEGV signals and convert them into software exceptions. You can then handle them like any other exception, giving you the chance to log details to your error-monitoring platform and recover without a crash.

While handling SIGSEGV is a good way to prevent hard failures, it’s still worth fully investigating and resolving each occurrence of this error. A segmentation fault indicates that the program is doing something that the Linux kernel explicitly forbids, pointing to serious reliability or security defects in your code. Merely catching and ignoring the signal could cause other problems in your program if it expects to have read or written memory which proved to be out of bounds.

Final thoughts

Segmentation faults occur when a program tries to use memory that it’s not allowed to access. They also arise when data is written to read-only memory and vice versa. In this article, you’ve seen how these errors are often the result of simple programming mistakes. You’ve also looked at how to identify a segmentation error as the cause of container terminations, and how you can start troubleshooting segmentation faults you experience in your programs. Staying ahead of these errors ensures your applications run with maximum reliability and uptime.

It can be difficult and time-consuming to troubleshoot segmentation faults, but powerful tools, like Airplane, can help streamline the process and make it simple and quick to solve errors in real time. Airplane is the developer platform for building custom internal tools.

With Airplane, users can build custom dashboards to help monitor and identify segmentation faults. Airplane's engineering workflows solution also makes it simple to build multi-step workflows for engineering-centric use cases, such as building a Postgres admin panel, deployment pipeline, AWS ECS dashboard, and more.

To try out Airplane and build your first monitoring dashboard within minutes, sign up for a free account or book a demo.

Share this article:
James Walker
James Walker is the founder of Heron Web, a UK-based digital agency providing bespoke software development services to SMEs.

Subscribe to new blog posts from Airplane.