Troubleshooting Guide
Triage Performance Degradation Symptoms
7 min
swimlane performance degradation symptoms can arise for a variety of causes including platform limitations client code limitations client job/data load upon the platform large record sets suboptimal backup configuration hardware limitations any combination of these examples include report a data entry lag in swimlane records docid\ sb88ugolqqr50chocadpd ui navigation and/or page load speed slowdowns these sometimes manifest as http 502 and 504 errors reported in chrome developer tools poor performance when searching records or exporting records timing out of python script tasks that interact with the restful api these also sometimes manifest as http 502 and 504 errors background job queue congestion to triage performance degradation symptoms take a screen capture video of the symptom while it is manifesting for ui slowness, collect a har file https //toolbox googleapps com/apps/har analyzer/ from chrome developer tools use the watershed script to export a copy of all of your apps/workflows, integration tasks, and other related artifacts you can request a copy of this script from your swimlane professional services engineer capture the mongodb query profiler data work with your support representative on this if requested by the your support representative, assist them to harvest mongodb diagnostic and log data these data can sometimes be large and require special arrangements for the file transfers see the instructions below for background job queue congestion symptoms provide all of the artifacts above in the pertinent support portal ticket tuning scripts whenever possible, you should attempt to refactor your automations targeting the restful api to minimize the number of round trips to the restful api and remove superfluous calls to record save() / record patch() minimize the load imposed on the restful api (and, by extension, the mongodb database) in each round trip make smart use of search apis see this resource https //swimlane python driver readthedocs io/en/stable/examples/resources html#search records to use the limit parameter appropriately with the swimlane driver’s app records search() use app reports build() against large record sets constrain the sorting order of results this may work best in cases where only the first few results in a sorted set are really needed background job queue congestion the queue is congested whenever its dashboard https //swimlane com/knowledge center/docs/administrator guide/system settings/background jobs shows that new jobs are being enqueued faster than previously enqueued jobs can finish processing this symptom can be either the result of underlying performance problems or the cause of other performance degradation symptoms when this symptom is severe (thousands or tens of thousands of enqueued jobs), it can prevent newly enqueued jobs from executing for hours or days to understand this symptom and its remedies, it’s important to understand the queue itself each job is an instance of either a built in integration task (one example is the nightly directory services sync) a customer created integration task as shown in the queue’s dashboard, each job passes through the following states enqueued processing succeeded / deleted (the failed state is not used by swimlane failed jobs are grouped in the deleted state ) there are other states such as scheduled, awaiting, or aborted, but these are not used often the quick fix to temporarily eliminate congestion is to work with your swimlane support representative to purge the background job queue to allow newer jobs to process this works for a short time !> important! before purging the job queue consider the ramifications the recommended purge method will delete all job execution data from the queue be aware that no swimlane records will be altered by the purge (but they may suffer indirect harm as described below) loss of knowledge of success/fail outcomes for completed jobs is often negligible because information about succeeded and deleted jobs are purged automatically every night at midnight server time swimlane only retains this data for 24 48 hours this same information can often be reconstructed from the swimlane log stream (the swimlane logs collection in mongodb) however, the loss whose cost must be carefully considered is the elimination of the jobs in the enqueued state when thousands or tens of thousands of jobs are congested the following questions must be addressed is it more harmful to leave these jobs enqueued knowing that recent swimlane alarm records will go under enriched for hours or days? or, is it more harmful to eliminate all enqueued jobs so that subsequent jobs can start and finish more promptly? to answer this question, consider the consequences of either leaving the recently ingested records under enriched or putting forth special effort to back fill the under enriched records consult with your professional services engineer for assistance using the bulk edit feature and/or special purpose scripts to catalyze enrichment on all records neglected during queue congestion diagnosing the queue after stopping the tasks service(s) and purging the queue, disable all integration tasks decide on one small suit of tasks to enable these tasks should all pertain to one use case (one security alarm type and its automation processing flow), but they may only be a subset of the tasks belonging to that use case enable only those chosen tasks monitor the job queue for 1 3 hours does swimlane keep up with this reduced load of tasks? if swimlane keeps up with the reduced load by never falling permanently behind, then add a few more tasks (completing the first use case’s portfolio or adding a small second use case), and continue monitoring as soon as swimlane falls behind permanently, then you know precisely how to increase load incrementally until the ability of the swimlane deployment to keep up has been surpassed the information about which tasks were enabled, in what order, during what span of time, is the information you need to pass along (along with the other artifacts requested above) to provide a solution