How to detect and diagnose pods with problems in Openshift
Pods are the basic unit of deployment of Kubernetes. Each pod contains one or more containers that share the same network connections and storage.
In this post you will learn the diferent ways to diagnose pods with problems.
The commands in this post were run in a computer with Ubuntu 22.04.1 LTS connected to Openshift 4.11 Developer Sandbox. The commands for Kubernetes and other container orchestrators are like the ones in this post.
The best option: Use Health Probes
Health Probes execute commands in the command line and make HTTP calls to check the status of an application. It is the standard Kubernetes mechanism to detect and resolve problems automatically.
There are three types of Health Probes:
- Startup Probes: Are executed when the pod is started and before the readiness and liveness probes. The purpose is to detect when the application is started.
- Readiness Probes: Reports if the pod is ready to receive network traffic.
- Liveness Probes: Reports if the application health is Ok.
Health Probes are defined at container level in the YAML file of the deployment or deploymentConfig.
All Health Probes run periodically in intervals of seconds (periodSeconds
) and may have an initial delay (initialDelaySeconds
) to prevent false negatives.
A run is considered failed when one of these conditions is meet:
- When the command returns error.
- When the status code of the HTTP response is not betweeen 200 and 399.
- When the probe execution exceeds the seconds set in
timeoutSeconds
property.
When startup probes fail or the number of failed liveness probes runs exceeds the value of failureThreshold
, the pod is marked as unhealthy, and the Openshift node where the pod is located runs the restartPolicy
which normally restarts the pod.
When the number of successes runs of readiness probes exceeds the value of successThreshold
the pod is marked as ready and starts receiving ingress network traffic. If not, it is marked as not ready, and the ingress traffic is redirected to other pods or rejected if there are no pods ready.
This is a perfect scenario, but there are errors that health probes can’t detect such the ones that prevent creating new pods, or failures in application paths without a probe.
Check events
Kubernetes Events are the history of the changes in the life-cycle of deployments and pods of a namespace.
Use events to detect issues with quotas, problems with containers images, errors when mounting volumes, and other problems that prevent pods from creating.
You can get all the events with the command oc get events
or filter by a field selector.
The next example shows the events of a pod named ‘dotnet-basic-68f86bc847-brhqh’:
branyac@ubuntu-builder:~$ oc get event --field-selector involvedObject.name=dotnet-basic-68f86bc847-brhqh
LAST SEEN TYPE REASON OBJECT MESSAGE
36m Normal Scheduled pod/dotnet-basic-68f86bc847-brhqh Successfully assigned branyac-dev/dotnet-basic-68f86bc847-brhqh to ip-10-0-204-219.ec2.internal by ip-10-0-199-46
36m Normal AddedInterface pod/dotnet-basic-68f86bc847-brhqh Add eth0 [10.129.5.157/23] from openshift-sdn
36m Normal Pulling pod/dotnet-basic-68f86bc847-brhqh Pulling image "image-registry.openshift-image-registry.svc:5000/branyac-dev/dotnet-basic@sha256:e9ce82808ae9277d67844fab31723de4110ad083b9d7419e40a8cfef3535e99c"
36m Normal Pulled pod/dotnet-basic-68f86bc847-brhqh Successfully pulled image "image-registry.openshift-image-registry.svc:5000/branyac-dev/dotnet-basic@sha256:e9ce82808ae9277d67844fab31723de4110ad083b9d7419e40a8cfef3535e99c" in 9.490165333s
36m Normal Created pod/dotnet-basic-68f86bc847-brhqh Created container dotnet-basic
36m Normal Started pod/dotnet-basic-68f86bc847-brhqh Started container dotnet-basic
branyac@ubuntu-builder:~$
Check the logs
Normally applications write warnings and error messages in the console. You can read it by attaching to the pod log using the command oc logs -f [PODNAME]
The next example shows how to attach to the log of the pod ‘dotnet-basic-68f86bc847-brhqh’:
branyac@ubuntu-builder:~$ oc logs -f dotnet-basic-68f86bc847-brhqh
warn: Microsoft.AspNetCore.DataProtection.Repositories.FileSystemXmlRepository[60]
Storing keys in a directory '/opt/app-root/.aspnet/DataProtection-Keys' that may not be persisted outside of the container. Protected data will be unavailable when container is destroyed.
warn: Microsoft.AspNetCore.DataProtection.KeyManagement.XmlKeyManager[35]
No XML encryptor configured. Key {1be0e376-b09a-4294-a999-a9c3a1a3fa22} may be persisted to storage in unencrypted form.
info: Microsoft.Hosting.Lifetime[14]
Now listening on: http://[::]:8081
info: Microsoft.Hosting.Lifetime[0]
Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
Hosting environment: Development
info: Microsoft.Hosting.Lifetime[0]
Content root path: /opt/app-root/src/bin/Release/net6.0/publish/
Also, applications can have log files. You can copy a file or an entire folder from the pod to your computer with the command oc cp [PODNAME]:[REMOTEPATH] [LOCALPATH]
The next example copies the file ‘/tmp/tmp.txt’ from a pod to the local working directory:
branyac@ubuntu-builder:~$ oc cp dotnet-basic-68f86bc847-brhqh:/tmp/log.txt ./tmp.txt
tar: Removing leading `/' from member names
branyac@ubuntu-builder:~$ ls -la tmp.txt
-rw-rw-r-- 1 branyac branyac 0 jan 1 00:10 tmp.txt
Connect to the console
You may execute additional commands such connection tests or specific commands to check application health using oc rsh [PODNAME]