14. Self-Healing

Get pod details

$ kubectl get pods -o wide

Get first nginx pod and delete it - one of the nginx pods should be in ‘Terminating’ status

$ NGINX_POD=$(kubectl get pods -l app=nginx --output=jsonpath="{.items[0].metadata.name}")
$ kubectl delete pod $NGINX_POD; kubectl get pods -l app=nginx -o wide
$ sleep 10

Get pod details - one nginx pod should be freshly started

$ kubectl get pods -l app=nginx -o wide

Get deployement details and check the events for recent changes

$ kubectl describe deployment nginx-deployment

Halt one of the nodes (node2)

$ vagrant halt node2
$ sleep 30

Get node details - node2 Status=NotReady

$ kubectl get nodes

Get pod details - everything looks fine - you need to wait 5 minutes

$ kubectl get pods -o wide

Pod will not be evicted until it is 5 minutes old - (see Tolerations in ‘describe pod’ ). It prevents Kubernetes to spin up the new containers when it is not necessary

$ NGINX_POD=$(kubectl get pods -l app=nginx --output=jsonpath="{.items[0].metadata.name}")
$ kubectl describe pod $NGINX_POD | grep -A1 Tolerations

Sleeping for 5 minutes

$ sleep 300

Get pods details - Status=Unknown/NodeLost and new container was started

$ kubectl get pods -o wide

Get depoyment details - again AVAILABLE=3/3

$ kubectl get deployments -o wide

Power on the node2 node

$ vagrant up node2
$ sleep 70

Get node details - node2 should be Ready again

$ kubectl get nodes

Get pods details - ‘Unknown’ pods were removed

$ kubectl get pods -o wide