Debugging Your Kubernetes Cluster, Pods, and Containers

When something goes wrong, it’s often the DevOps engineer who’s responsible for detecting and solving the issue immediately. While Kubernetes has many benefits when it comes to speed and resource efficiency, its added complexity can make debugging more difficult. In order to resolve problems efficiently and avoid interruptions for end users, it’s vitally important that you understand how to debug a Kubernetes cluster.

In this article, you’ll learn how to quickly and effectively debug your cluster, nodes, pods, and containers.

Cluster level debugging

Let’s say there’s an issue with your Kubernetes cluster. Because these clusters are made up of several components like nodes and control planes, any problem with them can lead to issues with your cluster. To successfully debug the cluster, you can try one of the suggestions in the following sections.

Obtaining information about your clusters

The first step toward debugging your cluster is to gather more information about its components. Run the following command:

The kubectl cluster-info command outputs information about the status of the control plane and CoreDNS. As seen below, the command shows there are no issues with the control plane of the cluster and the CoreDNS is running correctly.

To get more detailed information about your cluster, you can run:

The output of the previous command was not included due to its length.

The kubectl cluster-info dump command gives detailed information about your cluster and activities carried out on the cluster.

Getting the status of your node

An unhealthy node is a problem and affects the cluster as a whole. To get the status of a node, run:

The command’s output shows you the name, status, roles, age, and the Kubernetes version that runs the node.

Below, the node with the minikube name has the ready status. So the minikube node is running fine, and there are no issues. If you notice that the status of any of your nodes is not ready, then you can assume there’s an issue with the node.

NAME	STATUS	ROLES	AGE	VERSION
minikube	Ready	control-plane,master	3d12h	v1.22.2

Getting the health status of cluster components

Kubernetes clusters have different components like the scheduler, controller manager, and the etcd. Knowing the health status of the components will help save time debugging your cluster. To get the health status of your cluster components, run:

As you can see below, the scheduler is unhealthy, while the controller manager and etcd are healthy.

NAME	STATUS	MESSAGE	ERROR
scheduler	Unhealthy	Get "http://127.0.0.1:10251/healthz": dial tcp 127.0.0.1:10251:	connect: connection refused
controller-manager	Healthy	OK
etcd-0	Healthy	{"health":"true","reason":""}

Getting activities in your cluster

Viewing all of the events that have taken place on your cluster is another effective way to debug your cluster. You can spot any error that occurred while a particular action was carried out on your cluster namespace.

To get all the events that occurred in your cluster, you can run:

The command below will show you the details of all events carried out on your cluster and why the action occurred.

40s	Normal	Scheduled	pod/worker-deploy-799b5fb489-7c4sx	Successfully assigned default/worker-deploy-799b5fb489-7c4sx to minikube
37s	Normal	Pulled	pod/worker-deploy-799b5fb489-7c4sx	Container image "kodekloud/examplevotingapp_worker:v1" already present on machine
35s	Normal	Created	pod/worker-deploy-799b5fb489-7c4sx	Created container worker-app
35s	Normal	Started	pod/worker-deploy-799b5fb489-7c4sx	Started container worker-app
41s	Normal	Killing	pod/worker-deploy-799b5fb489-gc9xr	Stopping container worker-app
41s	Normal	SuccessfulCreate	replicaset/worker-deploy-799b5fb489	Created pod: worker-deploy-799b5fb489-7c4sx

Pods and container debugging

If your cluster and node are healthy, but you still have issues with your pods and container, it’s time to examine the pods and containers running inside. You may have mistakenly tried to run a pod with a nonexistent image or something similar. These suggested actions below might be a helpful starting point.

Describing pods

You may have issues with your pods due to problems with the containers inside them. When you run kubectl get pods, you might notice a status like ImagePullBackOff or pending instead of the normal Running status. The kubectl describe pod podname can help you quickly get to the root of the trouble. Let’s dive into these two examples.

Pods with imagepullbackoff error status

To simulate an example of a pod with the status ImagePullBackOff error, create a pod with the image busybox888. Copy the YAML content below and store it inside a file with the name pods.yaml. Then, create a pod object with kubectl create -f pods.yaml.

yaml

Get the pods running on your cluster with kubectl get pods. You can see the output of this command below. Notice that the demopod has ImagePullBackOff status.

NAME	READY	STATUS	RESTARTS	AGE
demopod	0/1	ImagePullBackOff	0	2m13s
postgres-deploy-8695749f5f-nnd67	1/1	Running	3 (3h4m ago)	9d
redis-deploy-5d7988b4bb-9kntq	1/1	Running	3 (3h4m ago)	9d
result-app-deploy-b8f4fc44b-8phh2	1/1	Running	3 (3h4m ago)	9d
voting-app-deploy-547678ccc7-67sh5	1/1	Running	3 (3h4m ago)	9d
worker-deploy-799b5fb489-7c4sx	1/1	Running	0	37m

After seeing the ImagePullBackOff status, you need to be able to identify the cause of ImagePullBAckOff correctly. The kubectl describe pod gives a comprehensive overview of your pods.

To get an overview of demopod, run:

If you scroll down after running the command, you’ll notice the output below showing the reason for the error. The error occurred because Kubernetes couldn’t pull the image from the Docker repository. The ImagePullBackOff error can occur if the image doesn’t exist or is hosted inside a private Docker registry that’s not accessible to Kubernetes.

The error can also occur when Kubernetes tries to pull a container with an image that’s not hosted on the Docker public registry but in your local computer system. Kubernetes will try to pull the image multiple times without success, leading to an ImagePullBackOff error.

If you take a look at the output of the kubectl describe pod demopod command, you will notice an ErrImagePull error. This error occurs while pulling the image of the container that will run inside the pod. Notice also the ImagePullBackOff error, which happens when Kubernetes stops pulling the image due to several ErrImagePull errors.

In this scenario, the failure occurred because the busybox888 image doesn’t exist. Kubernetes will try to pull the busybox888 image, but the output will show an ErrImagePull error. Kubernetes will try to pull the image again, and for each pull, the ErrImagePull error displays, resulting in the ImagePullBackOff error.

Events:

TYPE	REASON	AGE	FROM	MESSAGE
----	------	----	----	-------
Normal	Scheduled	27m	default-scheduler	Successfully assigned default/demopod to minikube
Normal	Pulling	25m (x4 over 27m)	kubelet	Pulling image "busybox888"
Warning	Failed	25m (x4 over 27m)	kubelet	Failed to pull image "busybox888": rpc error: code = Unknown desc = Error response from daemon: pull access denied for busybox888, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
Warning	Failed	25m (x4 over 27m)	kubelet	Error: ErrImagePull
Warning	Failed	25m (x4 over 27m)	kubelet	Error: ImagePullBackOff
Normal	BackOff	2m26s (x101 over 27m)	kubelet	Back-off pulling image "busybox888"

The correct name for the image is busybox, not busybox888. You’ll need to change the image name from busybox888 to busybox.

If you’re sure that the image name is correct, but you see ImagePullBackOff, the error could have occurred because the container image is hosted in a private Docker registry. To resolve this, you can specify the authorization details inside secrets.

Pods with pending status

When you run the kubectl get pods command, you can sometimes see pods with pending status. To simulate that scenario, create 1,000 pods on your Kubernetes namespace cluster.

Copy the content below and save it inside the deployment file that you created previously. Make sure you delete the previous contents in the deployment file.

yaml

Run kubectl apply -f deployment.yaml to apply the changes to your deployment. Then run kubectl get pods to get your pods. Notice that a pod called demodeploy-6df58566f5-2p969 has pending status. You can use the kubectl describe pods name of pod to get more details about the pending status.

NAME	READY	STATUS	AGE
demodeploy-6df58566f5-26jwq	0/1	Terminating	12m
demodeploy-6df58566f5-2p969	0/1	Pending	5m43s
demodeploy-6df58566f5-2vwn9	0/1	Pending	5m39s
demodeploy-6df58566f5-2xftp	0/1	Pending	4s
demodeploy-6df58566f5-4jrvc	0/1	Pending	7s
demodeploy-6df58566f5-4rp6m	0/1	Pending	5m42s
demodeploy-6df58566f5-4xb7n	0/1	Pending	5m33s
demodeploy-6df58566f5-4zpsk	0/1	Pending	5m50s

In order to find out why pod demodeploy-6df58566f5-2p969 is pending, you can run kubectl describe pod with the demodeploy-6df58566f5-2p969 pod, i.e. kubectl describe pod demodeploy-6df58566f5-2p969.

kubectl describe pods nameofpodwithpendingstatus gets you detailed information about any pod with pending status. In this case, the demodeploy-6df58566f5-2p969 pod has a pending status and was included in kubectl describe pod. In your case, the pod with pending status might have a different name; you need to get the name and add it to the kubectl describe pod, i.e. kubectl describe pod nameofyourpodwithpendingstatus.

After running the kubectl describe pods nameofyourpodwithpendingstatus command, scroll down to the bottom of the output of the command. You’ll see the information shown below.

Status:	Pending
kube-api-access-np625:
Type:	Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds:	3607
ConfigMapName:	kube-root-ca.crt
ConfigMapOptional:	<nil>
DownwardAPI:	true;
QoS Class:	BestEffort
Node-Selectors:	<none>
Tolerations:	node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
	node.kubernetes.io/unreachable:NoExecute op=Exists for 300s

Events:

Type	Reason	Age	From	Message
----	-----	----	----	-------
Warning	FailedScheduling	5m44s (x4 over 6m37s)	default-scheduler	0/1 nodes are available: 1 Too many pods.

The result shows that there are too many pods on the cluster (in this case, the cluster is running on minikube). The demodeploy-6df58566f5-2p969 pod will remain in the pending state due to the overload of pods. If you encounter such an error, make sure the cluster is not overloaded.

Other pod errors

When running kubectl get pods, you’ll naturally encounter other errors. The kubectl describe podname command can get you more information. Some possible errors include:

RunContainerError. Occurs when the container inside the pod can’t start due to the application’s configuration inside the container.
KillContainerError. Occurs when the container running inside the pod fails to stop (or be killed).
SetupNetworkError. Occurs if there is an issue setting up the network for the pod.
CrashLoopBackOff. Occurs when your pods continuously crash in an endless loop after starting. It can be caused by an issue with the application inside the container, misconfiguring parameters of the pod or container, or errors while creating your Kubernetes cluster. This could include using the same port for two or more containers in a pod.
‍Pods stuck in an error state. Occurs when the scheduler has scheduled the pod, but the pod fails to start. kubectl describe pod can help solve this.

Pod level logging

In addition to describing your pods, you can also check your pods’ logs. To view the events that have occurred inside your pods, run kubectl get logs podname.

Use kubectl get pods to get all your pods and then run kubectl logs anyrunningpodname.

The kubectl get pods command will output something similar to what’s shown below.

ERROR:	relation "votes" does not exist at character 38
STATEMENT:	SELECT vote, COUNT(id) AS count FROM votes GROUP BY vote
ERROR:	relation "votes" does not exist at character 38
STATEMENT:	SELECT vote, COUNT(id) AS count FROM votes GROUP BY vote
ERROR:	relation "votes" does not exist at character 38
STATEMENT:	SELECT vote, COUNT(id) AS count FROM votes GROUP BY vote
ERROR:	relation "votes" does not exist at character 38
STATEMENT:	SELECT vote, COUNT(id) AS count FROM votes GROUP BY vote
ERROR:	relation "votes" does not exist at character 38
STATEMENT:	SELECT vote, COUNT(id) AS count FROM votes GROUP BY vote
ERROR:	relation "votes" does not exist at character 38
STATEMENT:	SELECT vote, COUNT(id) AS count FROM votes GROUP BY vote

As you can see, the logs show details about the pod’s events, which can help you to understand how the container inside your pods is running.

Using exec to debug

Let’s say you want to fix errors that might be caused by the application running inside your container or the misconfiguration of your container. However, you can’t access the application unless you’re inside the container. To get inside and fix the errors, you need to use the kubectl exec command.

Copy the pod definition commands below and save them inside any file of your choice.

yaml

To navigate into the Redis container running inside demopod, run kubectl exec -it demopod -- /bin/bash. The -it makes the execution interactive or gives you an interactive interface. The command will display something similar to what is shown below. You’re inside the Redis container and can proceed to check other information like environmental details.

Other Kubernetes debugging tips

Your pods and containers might be running fine, however, you may not be able to access the pods externally (over the internet). If you can’t access your application running inside your pods, it might be due to misconfiguration in your service YAML file.

Copy the pod definition YAML content below into any file of your choice.

yaml

Then, copy the service definition YAML content below into another file of your choice.

yaml

Create both files by running kubectl create -f filename.

Finally, run kubectl get service to ensure your service has been created, as shown below.

NAME	TYPE	CLUSTER-IP	EXTERNAL-IP	PORT(S)	AGE
10d
service-myapp	NodePort	10.96.95.52	<none>	80:30008/TCP	31s

To get the service URL, run minikube service service-myapp --url.

When you click on the link, you should see the image below showing the output of NGINX. If you can’t see the image, then there’s an issue with your service. The error can occur if you specify a target port number that’s different from the NodePort number. Rectify this by specifying the correct port number in the service and pod YAML configuration files.

Another possible cause of the error might be that you didn’t specify the correct selector inside your service YAML file.

Listing the api version

If the API version of your Kubernetes objects has been misconfigured, you may be using a different API version that’s outdated or inaccurate. To troubleshoot this, list the apiVersion for each object.

A tool called Move2Kube uses source artifacts like Docker Compose or Cloud Foundry manifest files to source and generate Kubernetes deployment artifacts, including object YAML, Helm charts, and operators. You can install Move2Kube with the command below.

bash

Next, run move2kube collect. You will see a similar output below, which shows that the information has been collected.

bash

You’ll see something similar to what is shown below.

yaml

Navigate into the m2k_collect folder and find the clusters folder (the cluster folder holds collected information about your Kubernetes cluster). In the clusters folder is a file with the name minikube--77f0e6522d6f6d24.yaml. Use that to view the file’s content. While the output is longer than this, you can still double-check that you’re using the right apiVersion.

Final thoughts

This article has introduced you to several ways to debug your Kubernetes clusters, pods, and containers. With a few simple kubectl commands, you can solve problems that range from misconfiguring your API version to identifying that you ran your pod with an image that wasn’t located on the Docker repository.

If you're looking to build an internal UI to help monitor and debug your Kubernetes clusters, pods, and containers, consider Airplane. Airplane is the developer platform for building custom internal tools. With Airplane Views, you can build a robust monitoring dashboard within minutes using Airplane's pre-built component library and template library.

To get started and build your first internal UI within minutes, sign up for a free account or book a demo.

Debugging your Kubernetes cluster, pods, and containers

Cluster level debugging

Obtaining information about your clusters

Getting the status of your node

Getting the health status of cluster components

Getting activities in your cluster

Pods and container debugging

Describing pods

Pods with imagepullbackoff error status

Pods with pending status

Other pod errors

Pod level logging

Using exec to debug

Other Kubernetes debugging tips

Listing the api version

Final thoughts

How to use NGINX Prometheus exporter

Collecting logs from AWS Fargate