How to detect and diagnose pods with problems in Openshift

Tags:  Openshift

Pods are the basic unit of deployment of Kubernetes. Each pod contains one or more containers that share the same network connections and storage.

In this post you will learn the diferent ways to diagnose pods with problems.

The commands in this post were run in a computer with Ubuntu 22.04.1 LTS connected to Openshift 4.11 Developer Sandbox. The commands for Kubernetes and other container orchestrators are like the ones in this post.

The best option: Use Health Probes

Health Probes execute commands in the command line and make HTTP calls to check the status of an application. It is the standard Kubernetes mechanism to detect and resolve problems automatically.

There are three types of Health Probes:

  • Startup Probes: Are executed when the pod is started and before the readiness and liveness probes. The purpose is to detect when the application is started.
  • Readiness Probes: Reports if the pod is ready to receive network traffic.
  • Liveness Probes: Reports if the application health is Ok.

Health Probes are defined at container level in the YAML file of the deployment or deploymentConfig.

All Health Probes run periodically in intervals of seconds (periodSeconds) and may have an initial delay (initialDelaySeconds) to prevent false negatives.

A run is considered failed when one of these conditions is meet:

  • When the command returns error.
  • When the status code of the HTTP response is not betweeen 200 and 399.
  • When the probe execution exceeds the seconds set in timeoutSeconds property.

When startup probes fail or the number of failed liveness probes runs exceeds the value of failureThreshold, the pod is marked as unhealthy, and the Openshift node where the pod is located runs the restartPolicy which normally restarts the pod.

When the number of successes runs of readiness probes exceeds the value of successThreshold the pod is marked as ready and starts receiving ingress network traffic. If not, it is marked as not ready, and the ingress traffic is redirected to other pods or rejected if there are no pods ready.

This is a perfect scenario, but there are errors that health probes can’t detect such the ones that prevent creating new pods, or failures in application paths without a probe.

Check events

Kubernetes Events are the history of the changes in the life-cycle of deployments and pods of a namespace.

Use events to detect issues with quotas, problems with containers images, errors when mounting volumes, and other problems that prevent pods from creating.

You can get all the events with the command oc get events or filter by a field selector.

The next example shows the events of a pod named ‘dotnet-basic-68f86bc847-brhqh’:

branyac@ubuntu-builder:~$ oc get event --field-selector
LAST SEEN   TYPE     REASON           OBJECT                              MESSAGE
36m         Normal   Scheduled        pod/dotnet-basic-68f86bc847-brhqh   Successfully assigned branyac-dev/dotnet-basic-68f86bc847-brhqh to ip-10-0-204-219.ec2.internal by ip-10-0-199-46
36m         Normal   AddedInterface   pod/dotnet-basic-68f86bc847-brhqh   Add eth0 [] from openshift-sdn
36m         Normal   Pulling          pod/dotnet-basic-68f86bc847-brhqh   Pulling image "image-registry.openshift-image-registry.svc:5000/branyac-dev/dotnet-basic@sha256:e9ce82808ae9277d67844fab31723de4110ad083b9d7419e40a8cfef3535e99c"
36m         Normal   Pulled           pod/dotnet-basic-68f86bc847-brhqh   Successfully pulled image "image-registry.openshift-image-registry.svc:5000/branyac-dev/dotnet-basic@sha256:e9ce82808ae9277d67844fab31723de4110ad083b9d7419e40a8cfef3535e99c" in 9.490165333s
36m         Normal   Created          pod/dotnet-basic-68f86bc847-brhqh   Created container dotnet-basic
36m         Normal   Started          pod/dotnet-basic-68f86bc847-brhqh   Started container dotnet-basic

Check the logs

Normally applications write warnings and error messages in the console. You can read it by attaching to the pod log using the command oc logs -f [PODNAME]

The next example shows how to attach to the log of the pod ‘dotnet-basic-68f86bc847-brhqh’:

branyac@ubuntu-builder:~$ oc logs -f dotnet-basic-68f86bc847-brhqh
warn: Microsoft.AspNetCore.DataProtection.Repositories.FileSystemXmlRepository[60]
      Storing keys in a directory '/opt/app-root/.aspnet/DataProtection-Keys' that may not be persisted outside of the container. Protected data will be unavailable when container is destroyed.
warn: Microsoft.AspNetCore.DataProtection.KeyManagement.XmlKeyManager[35]
      No XML encryptor configured. Key {1be0e376-b09a-4294-a999-a9c3a1a3fa22} may be persisted to storage in unencrypted form.
info: Microsoft.Hosting.Lifetime[14]
      Now listening on: http://[::]:8081
info: Microsoft.Hosting.Lifetime[0]
      Application started. Press Ctrl+C to shut down.
info: Microsoft.Hosting.Lifetime[0]
      Hosting environment: Development
info: Microsoft.Hosting.Lifetime[0]
      Content root path: /opt/app-root/src/bin/Release/net6.0/publish/

Also, applications can have log files. You can copy a file or an entire folder from the pod to your computer with the command oc cp [PODNAME]:[REMOTEPATH] [LOCALPATH]

The next example copies the file ‘/tmp/tmp.txt’ from a pod to the local working directory:

branyac@ubuntu-builder:~$ oc cp dotnet-basic-68f86bc847-brhqh:/tmp/log.txt ./tmp.txt
tar: Removing leading `/' from member names
branyac@ubuntu-builder:~$ ls -la tmp.txt
-rw-rw-r-- 1 branyac branyac 0 jan  1 00:10 tmp.txt

Connect to the console

You may execute additional commands such connection tests or specific commands to check application health using oc rsh [PODNAME]



Sergio Monedero

I am excited to share my knowledge and insights on programming and devops through this personal website. I am a lifelong learner with a passion for technology, and I enjoy staying up-to-date on the latest industry trends.

Keep in touch with Me: SergioCoder@LinkedIn | Branyac@Github