Node and cluster maintenance on K8s Helm deployment
at some point, you'll need to carry out maintenance on a node (such as for a kernel upgrade, to apply a security patch, to upgrade the operating system, perform hardware maintenance, take a vm snapshot, etc ) which may require a single node or even a cluster wide shutdown or reboot it's critical that these events are handled gracefully in a helm deployment of swimlane on kubernetes (k8s) definition node a virtual or physical server can be a master or a worker cluster a group of interconnected nodes single node maintenance in brief, maintenance on a single node will include gracefully shutdown k8s components perform node maintenance restart the node (if necessary) start k8s components again note if additional nodes require maintenance then it is recommended to perform the steps below for each node one at a time or refer to the "multiple nodes maintenance" section for cluster wide maintenance connect (ssh) to the node which requires maintenance drain the node the kubectl drain operation reschedules all the pods from the node onto another node to do so, it will 1) tag the node unschedulable in order to prevent new pods from being scheduled (equal to kubectl cordon \<node name> ) and 2) evicts or deletes all pods running on the node for pods with a replica set (like mongo), the pod will be replaced by a new pod that will be scheduled to a new node draining a node won’t cause any downtime as long as mongo’s resource utilization isn’t so high that it can’t run alongside another mongo pod during the maintenance window tip use kubectl get nodes owide to lookup the node name $ kubectl drain \<node name> ignore daemonsets delete local data note the flags 'ignore daemonsets' and 'delete local data' deletes data that is not persisted to an actual disk somewhere such as emptydir the command above may also need ' force' if there exists pods not managed by replicaset, job, daemonset or statefulset perform the necessary node maintenance (restart if needed) make the node schedulable again run the following command to uncordon the node which tags the node to be schedulable again $ kubectl uncordon \<node name> repeat the process above for any additional nodes that need maintenance work multiple nodes maintenance it may become necessary to perform maintenance on all the nodes in a cluster together in such a scenario, follow the steps outlined below to gracefully stop k8s resources, perform the maintenance, and then restart k8s resources note this method can also be used if you need to completely shutdown all the nodes (for example to save development cost when using a cloud provider) step 1 shutdown ensure you're on a master node or another machine which has kubectl configured to access the cluster stop the swimlane application tip \<release name> run the command helm ls to lookup the helm release name \<chart name> will be swimlane/swimlane if working with the chart remotely otherwise swimlane $ helm upgrade \<release name> \<chart name> f \<path to values yaml> set api replicacount=0,tasks replicacount=0,web replicacount=0,mongodb replicaset replicas=0 \# example using a chart remotely helm upgrade swimlane swimlane/swimlane f values yaml set api replicacount=0,tasks replicacount=0,web replicacount=0,mongodb replicaset replicas=0 \# example using a chart locally helm upgrade swimlane swimlane f swimlane/values yaml set api replicacount=0,tasks replicacount=0,web replicacount=0,mongodb replicaset replicas=0 make nodes unschedulable the kubectl cordon \<node name> command will tag the nodes unschedulable and prevent any new pods from being scheduled on other nodes note the commands below is for a 6 node cluster please update the code to reflect the number of nodes in your cluster $ kubectl cordon \<node1> $ kubectl cordon \<node2> $ kubectl cordon \<node3> $ kubectl cordon \<node4> $ kubectl cordon \<node5> $ kubectl cordon \<node6> drain the nodes gracefully terminate all pods running on the node note the commands below is for a 6 node cluster please update the code to reflect the number of nodes in your cluster $ kubectl drain \<node1> ignore daemonsets delete local data $ kubectl drain \<node2> ignore daemonsets delete local data $ kubectl drain \<node3> ignore daemonsets delete local data $ kubectl drain \<node4> ignore daemonsets delete local data $ kubectl drain \<node5> ignore daemonsets delete local data $ kubectl drain \<node6> ignore daemonsets delete local data step 2 perform maintenance perform the necessary maintenance on all the nodes gracefully shutdown and power down all nodes step 3 startup power up all the nodes ensure you're on a master node or another machine which has kubectl configured to access the cluster make nodes schedulable again the kubectl uncordon \<node name> command will tag the nodes schedulable and ready to take on new pods $ kubectl uncordon \<node1> $ kubectl uncordon \<node2> $ kubectl uncordon \<node3> $ kubectl uncordon \<node4> $ kubectl uncordon \<node5> $ kubectl uncordon \<node6> start the swimlane application tip \<release name> run the command helm ls to lookup the helm release name \<chart name> will be swimlane/swimlane if working with the chart remotely otherwise swimlane $ helm upgrade \<release name> \<chart name> f \<path to values yaml> \# example using a chart remotely helm upgrade swimlane swimlane/swimlane f values yaml \# example using a chart locally helm upgrade swimlane swimlane f swimlane/values yaml step 4 confirmation use the following methods to confirm that the cluster and application is back online run kubectl get nodes run kubectl get pods a login to swimlane application