Introducing Autopilot, an AI coding assistant
gradient
Debugging your Kubernetes cluster, pods, and containers

Debugging your Kubernetes cluster, pods, and containers

Dec 17, 2021
12 min read

When something goes wrong, it’s often the DevOps engineer who’s responsible for detecting and solving the issue immediately. While Kubernetes has many benefits when it comes to speed and resource efficiency, its added complexity can make debugging more difficult. In order to resolve problems efficiently and avoid interruptions for end users, it’s vitally important that you understand how to debug a Kubernetes cluster.

In this article, you’ll learn how to quickly and effectively debug your cluster, nodes, pods, and containers.

Cluster level debugging

Let’s say there’s an issue with your Kubernetes cluster. Because these clusters are made up of several components like nodes and control planes, any problem with them can lead to issues with your cluster. To successfully debug the cluster, you can try one of the suggestions in the following sections.

Obtaining information about your clusters

The first step toward debugging your cluster is to gather more information about its components. Run the following command:

go

The kubectl cluster-info command outputs information about the status of the control plane and CoreDNS. As seen below, the command shows there are no issues with the control plane of the cluster and the CoreDNS is running correctly.

go

To get more detailed information about your cluster, you can run:

go
The output of the previous command was not included due to its length.

The kubectl cluster-info dump command gives detailed information about your cluster and activities carried out on the cluster.

Getting the status of your node

An unhealthy node is a problem and affects the cluster as a whole. To get the status of a node, run:

go

The command’s output shows you the name, status, roles, age, and the Kubernetes version that runs the node.

Below, the node with the minikube name has the ready status. So the minikube node is running fine, and there are no issues. If you notice that the status of any of your nodes is not ready, then you can assume there’s an issue with the node.

NAMESTATUSROLESAGEVERSION
minikubeReadycontrol-plane,master3d12hv1.22.2

Getting the health status of cluster components

Kubernetes clusters have different components like the scheduler, controller manager, and the etcd. Knowing the health status of the components will help save time debugging your cluster. To get the health status of your cluster components, run:

go

As you can see below, the scheduler is unhealthy, while the controller manager and etcd are healthy.

NAMESTATUSMESSAGEERROR
schedulerUnhealthyGet "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251:connect: connection refused
controller-managerHealthyOK
etcd-0Healthy{"health":"true","reason":""}

Getting activities in your cluster

Viewing all of the events that have taken place on your cluster is another effective way to debug your cluster. You can spot any error that occurred while a particular action was carried out on your cluster namespace.

To get all the events that occurred in your cluster, you can run:

go

The command below will show you the details of all events carried out on your cluster and why the action occurred.

40sNormalScheduledpod/worker-deploy-799b5fb489-7c4sxSuccessfully assigned default/worker-deploy-799b5fb489-7c4sx to minikube
37sNormalPulledpod/worker-deploy-799b5fb489-7c4sxContainer image "kodekloud/examplevotingapp_worker:v1" already present on machine
35sNormalCreatedpod/worker-deploy-799b5fb489-7c4sxCreated container worker-app
35sNormalStartedpod/worker-deploy-799b5fb489-7c4sxStarted container worker-app
41sNormalKillingpod/worker-deploy-799b5fb489-gc9xrStopping container worker-app
41sNormalSuccessfulCreatereplicaset/worker-deploy-799b5fb489Created pod: worker-deploy-799b5fb489-7c4sx

Pods and container debugging

If your cluster and node are healthy, but you still have issues with your pods and container, it’s time to examine the pods and containers running inside. You may have mistakenly tried to run a pod with a nonexistent image or something similar. These suggested actions below might be a helpful starting point.

Describing pods

You may have issues with your pods due to problems with the containers inside them. When you run kubectl get pods, you might notice a status like ImagePullBackOff or pending instead of the normal Running status. The kubectl describe pod podname can help you quickly get to the root of the trouble. Let’s dive into these two examples.

Pods with imagepullbackoff error status

To simulate an example of a pod with the status ImagePullBackOff error, create a pod with the image busybox888. Copy the YAML content below and store it inside a file with the name pods.yaml. Then, create a pod object with kubectl create -f pods.yaml.

yaml

Get the pods running on your cluster with kubectl get pods. You can see the output of this command below. Notice that the demopod has ImagePullBackOff status.

NAMEREADYSTATUSRESTARTSAGE
demopod0/1ImagePullBackOff02m13s
postgres-deploy-8695749f5f-nnd671/1Running3 (3h4m ago)9d
redis-deploy-5d7988b4bb-9kntq1/1Running3 (3h4m ago)9d
result-app-deploy-b8f4fc44b-8phh21/1Running3 (3h4m ago)9d
voting-app-deploy-547678ccc7-67sh51/1Running3 (3h4m ago)9d
worker-deploy-799b5fb489-7c4sx1/1Running037m

After seeing the ImagePullBackOff status, you need to be able to identify the cause of ImagePullBAckOff correctly. The kubectl describe pod gives a comprehensive overview of your pods.

To get an overview of demopod, run:

go

If you scroll down after running the command, you’ll notice the output below showing the reason for the error. The error occurred because Kubernetes couldn’t pull the image from the Docker repository. The ImagePullBackOff error can occur if the image doesn’t exist or is hosted inside a private Docker registry that’s not accessible to Kubernetes.

The error can also occur when Kubernetes tries to pull a container with an image that’s not hosted on the Docker public registry but in your local computer system. Kubernetes will try to pull the image multiple times without success, leading to an ImagePullBackOff error.

If you take a look at the output of the kubectl describe pod demopod command, you will notice an ErrImagePull error. This error occurs while pulling the image of the container that will run inside the pod. Notice also the ImagePullBackOff error, which happens when Kubernetes stops pulling the image due to several ErrImagePull errors.

In this scenario, the failure occurred because the busybox888 image doesn’t exist. Kubernetes will try to pull the busybox888 image, but the output will show an ErrImagePull error. Kubernetes will try to pull the image again, and for each pull, the ErrImagePull error displays, resulting in the ImagePullBackOff error.

Events:

TYPEREASONAGEFROMMESSAGE
-------------------------
NormalScheduled27mdefault-schedulerSuccessfully assigned default/demopod to minikube
NormalPulling25m (x4 over 27m)kubeletPulling image "busybox888"
WarningFailed25m (x4 over 27m)kubeletFailed to pull image "busybox888": rpc error: code = Unknown desc = Error response from daemon: pull access denied for busybox888, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
WarningFailed25m (x4 over 27m)kubeletError: ErrImagePull
WarningFailed25m (x4 over 27m)kubeletError: ImagePullBackOff
NormalBackOff2m26s (x101 over 27m)kubeletBack-off pulling image "busybox888"

The correct name for the image is busybox, not busybox888. You’ll need to change the image name from busybox888 to busybox.

If you’re sure that the image name is correct, but you see ImagePullBackOff, the error could have occurred because the container image is hosted in a private Docker registry. To resolve this, you can specify the authorization details inside secrets.

Pods with pending status

When you run the kubectl get pods command, you can sometimes see pods with pending status. To simulate that scenario, create 1,000 pods on your Kubernetes namespace cluster.

Copy the content below and save it inside the deployment file that you created previously. Make sure you delete the previous contents in the deployment file.

yaml

Run kubectl apply -f deployment.yaml to apply the changes to your deployment. Then run kubectl get pods to get your pods. Notice that a pod called demodeploy-6df58566f5-2p969 has pending status. You can use the kubectl describe pods name of pod to get more details about the pending status.

NAMEREADYSTATUSRESTARTSAGE
demodeploy-6df58566f5-26jwq0/1Terminating012m
demodeploy-6df58566f5-2p9690/1Pending05m43s
demodeploy-6df58566f5-2vwn90/1Pending05m39s
demodeploy-6df58566f5-2xftp0/1Pending04s
demodeploy-6df58566f5-4jrvc0/1Pending07s
demodeploy-6df58566f5-4rp6m0/1Pending05m42s
demodeploy-6df58566f5-4xb7n0/1Pending05m33s
demodeploy-6df58566f5-4zpsk0/1Pending05m50s

In order to find out why pod demodeploy-6df58566f5-2p969 is pending, you can run kubectl describe pod with the demodeploy-6df58566f5-2p969 pod, i.e. kubectl describe pod demodeploy-6df58566f5-2p969.

kubectl describe pods nameofpodwithpendingstatus gets you detailed information about any pod with pending status. In this case, the demodeploy-6df58566f5-2p969 pod has a pending status and was included in kubectl describe pod. In your case, the pod with pending status might have a different name; you need to get the name and add it to the kubectl describe pod, i.e. kubectl describe pod nameofyourpodwithpendingstatus.

After running the kubectl describe pods nameofyourpodwithpendingstatus command, scroll down to the bottom of the output of the command. You’ll see the information shown below.

Status:Pending
kube-api-access-np625:
Type:Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds:3607
ConfigMapName:kube-root-ca.crt
ConfigMapOptional:<nil>
DownwardAPI:true;
QoS Class:BestEffort
Node-Selectors:<none>
Tolerations:node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

Events:

TypeReasonAgeFromMessage
------------------------
WarningFailedScheduling5m44s (x4 over 6m37s)default-scheduler0/1 nodes are available: 1 Too many pods.

The result shows that there are too many pods on the cluster (in this case, the cluster is running on minikube). The demodeploy-6df58566f5-2p969 pod will remain in the pending state due to the overload of pods. If you encounter such an error, make sure the cluster is not overloaded.

Other pod errors

When running kubectl get pods, you’ll naturally encounter other errors. The kubectl describe podname command can get you more information. Some possible errors include:

  • RunContainerError. Occurs when the container inside the pod can’t start due to the application’s configuration inside the container.
  • KillContainerError. Occurs when the container running inside the pod fails to stop (or be killed).
  • SetupNetworkError. Occurs if there is an issue setting up the network for the pod.
  • CrashLoopBackOff. Occurs when your pods continuously crash in an endless loop after starting. It can be caused by an issue with the application inside the container, misconfiguring parameters of the pod or container, or errors while creating your Kubernetes cluster. This could include using the same port for two or more containers in a pod.
  • Pods stuck in an error state. Occurs when the scheduler has scheduled the pod, but the pod fails to start. kubectl describe pod can help solve this.

Pod level logging

In addition to describing your pods, you can also check your pods’ logs. To view the events that have occurred inside your pods, run kubectl get logs podname.

Use kubectl get pods to get all your pods and then run kubectl logs anyrunningpodname.

The kubectl get pods command will output something similar to what’s shown below.

ERROR:relation "votes" does not exist at character 38
STATEMENT:SELECT vote, COUNT(id) AS count FROM votes GROUP BY vote
ERROR:relation "votes" does not exist at character 38
STATEMENT:SELECT vote, COUNT(id) AS count FROM votes GROUP BY vote
ERROR:relation "votes" does not exist at character 38
STATEMENT:SELECT vote, COUNT(id) AS count FROM votes GROUP BY vote
ERROR:relation "votes" does not exist at character 38
STATEMENT:SELECT vote, COUNT(id) AS count FROM votes GROUP BY vote
ERROR:relation "votes" does not exist at character 38
STATEMENT:SELECT vote, COUNT(id) AS count FROM votes GROUP BY vote
ERROR:relation "votes" does not exist at character 38
STATEMENT:SELECT vote, COUNT(id) AS count FROM votes GROUP BY vote

As you can see, the logs show details about the pod’s events, which can help you to understand how the container inside your pods is running.

Using exec to debug

Let’s say you want to fix errors that might be caused by the application running inside your container or the misconfiguration of your container. However, you can’t access the application unless you’re inside the container. To get inside and fix the errors, you need to use the kubectl exec command.

Copy the pod definition commands below and save them inside any file of your choice.

yaml

To navigate into the Redis container running inside demopod, run kubectl exec -it demopod -- /bin/bash. The -it makes the execution interactive or gives you an interactive interface. The command will display something similar to what is shown below. You’re inside the Redis container and can proceed to check other information like environmental details.

go

Other Kubernetes debugging tips

Your pods and containers might be running fine, however, you may not be able to access the pods externally (over the internet). If you can’t access your application running inside your pods, it might be due to misconfiguration in your service YAML file.

Copy the pod definition YAML content below into any file of your choice.

yaml

Then, copy the service definition YAML content below into another file of your choice.

yaml

Create both files by running kubectl create -f filename.

Finally, run kubectl get service to ensure your service has been created, as shown below.

NAMETYPECLUSTER-IPEXTERNAL-IPPORT(S)AGE
10d
service-myappNodePort10.96.95.52<none>80:30008/TCP31s

To get the service URL, run minikube service service-myapp --url.

When you click on the link, you should see the image below showing the output of NGINX. If you can’t see the image, then there’s an issue with your service. The error can occur if you specify a target port number that’s different from the NodePort number. Rectify this by specifying the correct port number in the service and pod YAML configuration files.

Another possible cause of the error might be that you didn’t specify the correct selector inside your service YAML file.

Output of minikube service
Output of minikube service

Listing the api version

If the API version of your Kubernetes objects has been misconfigured, you may be using a different API version that’s outdated or inaccurate. To troubleshoot this, list the apiVersion for each object.

A tool called Move2Kube uses source artifacts like Docker Compose or Cloud Foundry manifest files to source and generate Kubernetes deployment artifacts, including object YAML, Helm charts, and operators. You can install Move2Kube with the command below.

bash

Next, run move2kube collect. You will see a similar output below, which shows that the information has been collected.

bash

You’ll see something similar to what is shown below.

yaml

Navigate into the m2k_collect folder and find the clusters folder (the cluster folder holds collected information about your Kubernetes cluster). In the clusters folder is a file with the name minikube--77f0e6522d6f6d24.yaml. Use that to view the file’s content. While the output is longer than this, you can still double-check that you’re using the right apiVersion.

Final thoughts

This article has introduced you to several ways to debug your Kubernetes clusters, pods, and containers. With a few simple kubectl commands, you can solve problems that range from misconfiguring your API version to identifying that you ran your pod with an image that wasn’t located on the Docker repository.

If you're looking to build an internal UI to help monitor and debug your Kubernetes clusters, pods, and containers, consider Airplane. Airplane is the developer platform for building custom internal tools. With Airplane Views, you can build a robust monitoring dashboard within minutes using Airplane's pre-built component library and template library.

To get started and build your first internal UI within minutes, sign up for a free account or book a demo.

Share this article:
Nate Matherson
Nate Matherson is the Co-founder & CEO of ContainIQ. An experienced entrepreneur and technologist, he has founded multiple venture-backed companies and is a two-time Y Combinator Alum.

Subscribe to new blog posts from Airplane.