System Resource Monitoring

Overview

We often need to monitor a system during load testing to determine how heavily the system is being taxed (i.e. determining the proper AWS instance type) and to determine what is being hit (memory, cpu, etc.). Often the target system has monitoring features (e.g. Prometheus or other); however, there are times when we do not have a monitoring tool and, more often, times when we don't want the hit of the monitoring tool itself impacting the load test.

The need is typically to spin up a server, add the application and hit the server with a load test (locust.io, jmeter, ab, etc.). Adding node_exporter to an instance takes a few seconds but is useless if you do not have Prometheus setup or available to the instance. This typically leads to running the load test and periodically checking metrics of the resource as the test runs - not an optimal solution.

Our quest to automate this task had two key requirements:

  • easy to install (low dependencies)
  • small footprint (if the monitor itself takes 50% of memory/cpu its not real helpful)

sarmore

Our first attempt to address this issue is basic and meets the requirement. sarmore uses SAR which is already on most Linux instances and may already be running.

sarmore is very basic, it simply starts 3 SAR instances (one each to monitor cpu, memory, and load) which run for a given amount of time and log to the LOG_DIR specified.

Quick Summary: no install, may already be running

Benefits

  • typically no install
  • may already be running

Drawbacks

  • output format is not the best to work with

pymore

Our second attempt to address this issue was a python-based solution given python is typically on most linux instances. pymore runs in a loop, polling system resources at a given interval and stores the results to log files.

Quick Summary: clone repo and run (no dependencies)

Benefits

  • clone repo, no app dependencies
  • more detailed, customizable metrics can be captured

Drawbacks

  • requires git/cloning
  • memory usage is relatively high

rumore

Our third attempt is a rust app which ships/runs as a simple binary. rumore runs much as pymore does, in a loop and writes to log files.

Quick Summary: put binary on server and run

Benefits

  • single binary (no dependencies)
  • low memory usage
  • more detailed, customizable metrics can be captured

Drawbacks

  • requires putting binary on server

Summary

All three solutions have benefits/drawbacks which should be considered for each application. We typically use rumore where possible due to its light resource usage and ease of install.

Resource Utilization

Resource utilization of the resource monitors themselves is as follows:

ApplicationBin SizeVIRTRESSHRCPU%
sarmoren/a16,1402,2922,0760.0
pymoren/a19,99211,6286,0280.0
rumore1.8M2,9288887680.0

Note that the above stats for sarmore are for running it directly, if you already have sar running you would not need to run sarmore but rather simply query sar.

Feedback

We welcome feedback and enhancements to any of the three applications, keeping in mind the two key requirements. Feel free to fork one (or all) and submit a PR. We would also be interested if you are willing to write similar logic in another language (such as Java, C/C++, etc.) to compare against the sarmore, pymore, and rumore.