Quick way to analyze MongoDB frequent queries with tcpdump

MongoDB has an internal profiler, but it’s often too complex for a quick statistics to see what kind of queries the database is getting. Luckily there’s an easy way to get some quick statistics with tcpdump. Granted, these examples are pretty naive in terms of accuracy, but they are really fast to do and they do give out a lot of useful information.

Get top collections which are getting queries:

tcpdump dst port 27017 -A -s 1400 |grep query | perl -ne '/^.{28}([^.]+\.[^.]+)(.+)/; print "$1\n";' > /tmp/queries.txt
sort /tmp/queries.txt | uniq -c | sort -n -k 1 | tail

The first command will dump the beginning of each packet as string which goes into MongoDB and it will then filter out everything except queries. The perl regexp clause will pick the target collection name and print it to stdout. You should run this around 10 seconds and then stop it with Ctrl+C. The next command sorts this log and prints top collections to stdout. You can run these commands both in your MongoDB machine or in your frontend machine.

You can also get more detailed statistics about the query by looking at the tcpdump. For example you can spot keywords like $and, $or, $readPreference etc which can help you to determine what kind of queries there are. Then you can pick up the queries you might want to cache with memcached or redis, or maybe to move some queries to the secondary instances.

Check out also this nice tool called MongoMem, which can tell you how much each of your collections are stored in the physical memory (RRS). This is also known as the “hot data”.