Collect Mongo DB Collection and Index Sizes Across All Databases
this article explains how to collect mongodb size statistics for all collections and indexes across all databases the output is sorted by largest collections first and can be saved to a file for review this is useful when investigating disk usage growth, storage exhaustion, or identifying which databases and collections are consuming the most space overview the script does the following loops through all databases retrieves stats for every collection captures collection size, storage size, index size, and total size sorts the results from largest to smallest by total size prints the output in a readable format notes size logical uncompressed size of the documents in the collection storagesize physical space allocated for document storage totalindexsize total size of all indexes for the collection totalsize combined size of collection storage and indexes because swimlane uses wiredtiger, collection data is compressed as a result, storagesize may be smaller than size steps connect to the mongo shell from the mongo pod example export ns=default kubectl n $ns exec i mongo 0 mongosh quiet \\ u admin \\ p "$(kubectl n $ns get secret mongo admin n default o jsonpath='{ data password}' | base64 d)" \\ \ authenticationdatabase admin \\ \ tls \\ \ tlsallowinvalidhostnames \\ \ tlsallowinvalidcertificates \\ admin run the following script function getreadablefilesizestring(filesizeinbytes) { var i = 1; var byteunits = \[' kb', ' mb', ' gb', ' tb', ' pb', ' eb', ' zb', ' yb']; do { filesizeinbytes = filesizeinbytes / 1024; i++; } while (filesizeinbytes > 1024); return math max(filesizeinbytes, 0 1) tofixed(1) + byteunits\[i]; } var dbs = db getmongo() getdbnames(); var results = \[]; dbs foreach(function(dbname) { var currentdb = db getmongo() getdb(dbname); currentdb getcollectionnames() foreach(function(collname) { try { var s = currentdb getcollection(collname) stats(); results push({ ns s ns, count s count || 0, size s size || 0, storagesize s storagesize || 0, totalindexsize s totalindexsize || 0, totalsize (s storagesize || 0) + (s totalindexsize || 0) }); } catch (e) { print("skipping " + dbname + " " + collname + " " + e); } }); }); results sort(function(a, b) { return b totalsize a totalsize; }); results foreach(function(r) { print( r ns + " | count=" + r count + " | size=" + getreadablefilesizestring(r size) + " | storagesize=" + getreadablefilesizestring(r storagesize) + " | totalindexsize=" + getreadablefilesizestring(r totalindexsize) + " | totalsize=" + getreadablefilesizestring(r totalsize) ); }); review the output example output app records | count=250000 | size=5 2 gb | storagesize=2 8 gb | totalindexsize=1 4 gb | totalsize=4 2 gb app attachments | count=18000 | size=3 6 gb | storagesize=2 1 gb | totalindexsize=512 0 mb | totalsize=2 6 gb save output to a file if you want to save the output to a file, run the command below from the shell instead of entering mongosh interactively export ns=default output file="mongo collection index sizes $(date +%y%m%d %h%m%s) txt" kubectl n $ns exec i mongo 0 mongosh quiet \\ u admin \\ p "$(kubectl n $ns get secret mongo admin n default o jsonpath='{ data password}' | base64 d)" \\ \ authenticationdatabase admin \\ \ tls \\ \ tlsallowinvalidhostnames \\ \ tlsallowinvalidcertificates \\ admin <<'eof' > "$output file" function getreadablefilesizestring(filesizeinbytes) { var i = 1; var byteunits = \[' kb', ' mb', ' gb', ' tb', ' pb', ' eb', ' zb', ' yb']; do { filesizeinbytes = filesizeinbytes / 1024; i++; } while (filesizeinbytes > 1024); return math max(filesizeinbytes, 0 1) tofixed(1) + byteunits\[i]; } var dbs = db getmongo() getdbnames(); var results = \[]; dbs foreach(function(dbname) { var currentdb = db getmongo() getdb(dbname); currentdb getcollectionnames() foreach(function(collname) { try { var s = currentdb getcollection(collname) stats(); results push({ ns s ns, count s count || 0, size s size || 0, storagesize s storagesize || 0, totalindexsize s totalindexsize || 0, totalsize (s storagesize || 0) + (s totalindexsize || 0) }); } catch (e) { print("skipping " + dbname + " " + collname + " " + e); } }); }); results sort(function(a, b) { return b totalsize a totalsize; }); results foreach(function(r) { print( r ns + " | count=" + r count + " | size=" + getreadablefilesizestring(r size) + " | storagesize=" + getreadablefilesizestring(r storagesize) + " | totalindexsize=" + getreadablefilesizestring(r totalindexsize) + " | totalsize=" + getreadablefilesizestring(r totalsize) ); }); eof echo "output saved to $output file" how to interpret the results use totalsize as the best quick view of how much space a collection and its indexes are consuming together large size with smaller storagesize this usually indicates document compression large totalindexsize this suggests indexes are contributing significantly to disk usage large storagesize with relatively small count this may indicate larger document payloads or previously allocated space use this process when mongodb disk usage is growing unexpectedly the mongo volume or mount point is nearing full capacity you need to identify the largest collections across the environment you want to compare document growth versus index growth additional recommendation if disk usage is the main concern, collect this output along with df h lsblk sudo fdisk l this helps correlate mongodb collection growth with underlying disk and partition usage