Post-Upgrade/Maintenance Checklist

information regarding this page this page is a recommended checklist to be conducted post upgrade/maintenance work, taking place in instances kubernetes checks check that all pods are running, ensuring that there are no issues with any pods run kubectl get pods a look for any crashloopbackoff, container restarts, errors check for any events that may have occurred run kubectl get events a look for any unhealthy, warnings or error states platform ui checks check that the ui is accessible and able to log in to the instance make sure that the login page displays enter username and password and login to instance when login is successful make sure no red box appears in the bottom right of the screen with “system error” if this happens check the logging tab or log back out and open up network console and then log back in to review where the error message is coming from network stack trace once on the instance check multiple workspaces and dashboards ensuring that they are displaying on the screen without any issues when navigating around workspaces and dashboards make sure that dashboards display with data note sometimes a dashboard may not show anything this can be due to it not being setup check multiple application record spaces checking that records are displaying and accessible click on a few records and make sure they are loading and able to display on the screen we are looking for any error message pop ups or records not loading note sometimes an indication of an issue can be record fields may appear to be empty when initially looking at the record check the logging section for any new error messages when in the logging section look at the timestamps for after the upgrade/maintenance work was completed and check for any new messages a good practice is to look at a recent message of a task failure or config mapping and scroll back through to see if it was failing prior to the upgrade/maintenance taking place hangfire checks check that servers show up on the hangfire page correctly verify that the correct amount of server inputs are correct for the system type verify that recurring jobs also run as expected navigate to the reecurring jobs and check that they are running on time and that there is no errors or issues verify that jobs are running successfully check the successful queue and verify that iintegrationjob performintegration are succeeding check for any new failing jobs that did not occur prior to the maintenance/upgrade taking place look at what is currently failing and look to see if they were failing before the upgrade/maintenance time for example, the upgrade started at 9 am and the upgrade finished at 10 am job fails at 10 30 am look at jobs before 9 am to see if it was failing before the upgrade/maintenance mongo checks check the replica status by running rs status() inside a mongodb this is to ensure that the mongo pods are healthy and working as expected check the replica sync status by running rs printsecondaryreplicationinfo() inside a mongodb this is to ensure that they are synching correctly and are up to date turbine agents check that remote agents are active and connected look at the remote agent tabs and verify they are correct and active check that playbooks are running on remote agents check playbookruns that are assigned to the $remote pool are running successfully shortened checklist kubernetes checks check that all pods are running ensuring that there are no issues with any pods check for any events that may have occurred platform ui checks check that the ui is accessible and able to log in to the instance check that the dashboards display on screen without any issues check that application records are available and accessible check the logging section for any new error messages hangfire checks check that servers show up on the hangfire page correctly verify that recurring jobs also run as expected verify that jobs are running successfully check for any new failing jobs that did not occur prior to the maintenance/upgrade taking place mongo checks check the rs status() ensure that all there are no errors it should show primary and secondary check that the value syncsourcehosts has a value populated check the rs printsecondaryreplicationinfo() status to ensure that the mongo containers are all synching correctly turbine agents check that remote agents are active and connected check that playbooks are running on remote agents