Daniel Watrous on Software Engineering

A Collection of Software Problems and Solutions

Posts tagged monitoring

Software Engineering

What is Cloud Native?

I hear a lot of people talking about cloud native applications these days. This includes technologists and business managers. I have found that there really is a spectrum of meaning for the term cloud native and that two people rarely mean the same thing when they say cloud native.

At one end of the spectrum would be running a traditional workload on a virtual machine. In this scenario the virtual host may have been manually provisioned, manually configured, manually deployed, etc. It’s cloudiness comes from the fact that it’s a virtual machine running in the cloud.

I tend to think of cloud native at the other end and propose the following definition:

The ability to provision and configure infrastructure, stage and deploy an application and address the scale and health needs of the application in an automated and deterministic way without human interaction

The activities necessary to accomplish the above are:

  • Provision
  • Configure
  • Build and Test
  • Deploy
  • Scale and Heal


Provision and Configure

The following diagram illustrates some of the workflow involved in provisioning and configuring resources for a cloud native application.

You’ll notice that there are some abstractions listed, including HEAT for openstack, CloudFormation for AWS and even Terraform, which can provision against both openstack and AWS. You’ll also notice that I include a provision flow that produces an image rather than an actual running resource. This can be helpful when using IaaS directly, but becomes essential when using containers. The management of that image creation process should include a CI/CD pipeline and a versioned image registry (more about that another time).

Build, Test, Deploy

With provisioning defined it’s time to look at the application Build, Test and Deploy steps. These are depicted in the following figure:

The color of the “Prepare Infrastructure” activity should hint that in this process it represents the workflow shown above under Provision and Configure. For clarity, various steps have been grouped under the heading “Application Staging Process”. While these can occur independently (and unfortunately sometimes testing never happens), it’s helpful to think of those four steps as necessary to validate any potential release. It should be possible to fully automate the staging of an application.


The discovery step is often still done in a manual way using configuration files or even manual edits after deploy. Discovery could include making sure application components know how to reach a database or how a load balancer knows to which application servers it should direct traffic. In a cloud native application, this discovery should be fully automated. When using containers it will be essential and very fluid. Some mechanisms that accommodate discovery include system level tools like etcd and DNS based tools like consul.

Monitor and Heal or Scale

There are loads of monitoring tools available today. A cloud native application requires monitoring to be close to real time and needs to be able to act on monitoring outputs. This may involve creating new resources, destroying unhealthy resources and even shifting workloads around based on latency or other metrics.

Tools and Patterns

There are many tools to establish the workflows shown above. The provision step will almost always be provider specific and based on their API. Some tools, such as terraform, attempt to abstract this away from the provider with mixed results. The configure step might include Ansible or a similar tool. The build, test and deploy process will likely use a tool like Jenkins to accomplish automation. In some cases the above process may include multiple providers, all integrated by your application.

Regardless of the tools you choose, the most important characteristic of a cloud native application is that all of the activities listed are automated and deterministic.

Software Engineering

Increase Munin Resolution to sub-minute

I previously explained how to get one-minute resolution in Munin. The process to get sub-minute resolution in Munin is more tricky. The main reason it’s more tricky is that cron only runs once per minute, which means data must be generated and cached in between cron runs for collection when cron runs.


In the case where a single datapoint is collected each time cron runs, the time at which cron runs is sufficient to store the data in rrd. With multiple datapoints being collected on a single cron run, it’s necessary to embed a timestamp with each datapoint so the datapoints can be properly stored in rrd.

For example, the load plugin which produces this for a one minute or greater collection time:

load.value 0.00

Would need to produce output like this for a five (5) second collection time:

load.value 1426889886:0.00
load.value 1426889891:0.00
load.value 1426889896:0.00
load.value 1426889901:0.00
load.value 1426889906:0.00
load.value 1426889911:0.00
load.value 1426889916:0.00
load.value 1426889921:0.00
load.value 1426889926:0.00
load.value 1426889931:0.00
load.value 1426889936:0.00
load.value 1426889941:0.00

Caching mechanism

In one example implementation of a one second collection rate, a datafile is either appended to or replaced using standard Linux file management mechanisms.

This looks a lot like a message queue problem. Each message needs to be published to the queue. The call to fetch is like the subscriber who then pulls all available messages, which empties the queue from the cache window.

Acquire must be long running

In the case of one minute resolution, the data can be generated at the moment it is collected. This means the process run by cron is sufficient to generate the desired data and can die after the data is output. For sub-minute resolution, a separate long running process is required to generate and cache the data. There are a couple of ways to accomplish this.

  1. Start a process that will only run util the next cron runs. This would be started each time cron fetched the data
  2. Create a daemon process that will produce a stream of data.

A possible pitfall with #2 above is that it would continue producing data even if the collection cron was failing. Option #1 results in more total processes being started.

Example using Redis

Redis is a very fast key/value datastore that runs natively on Linux. The following example shows how to use a bash script based plugin with Redis as the cache between cron runs. I can install Redis on Ubuntu using apt-get.

sudo apt-get install -y redis-server

And here is the plugin.

# (c) 2015  - Daniel Watrous
update_rate=5                   # sampling interval in seconds
cron_cycle=1                    # time between cron runs in minutes
pluginfull="$0"                 # full name of plugin
plugin="${0##*/}"               # name of plugin
run_acquire() {
   while [ "true" ]
      sleep $update_rate
      datapoint="$(cat /proc/loadavg | awk '{print "load.value " systime() ":" $1}')"
      redis-cli RPUSH $redis_cache "$datapoint"
# --------------------------------------------------------------------------
run_autoconf() {
   echo "yes"
run_config() {
cat << EOF
graph_title $graph
graph_category $section
graph_vlabel System Load
graph_scale no
update_rate $update_rate
graph_data_size custom 1d, 10s for 1w, 1m for 1t, 5m for 1y
load.label load
load.draw $style
run_fetch()  {
   timeout_calc=$(expr $cron_cycle \* 60 + 5)
   timeout $timeout_calc $pluginfull acquire >/dev/null &
   while [ "true" ]
     datapoint="$(redis-cli LPOP $redis_cache)"
     if [ "$datapoint" = "" ]; then
     echo $datapoint
exit 0

Restart munin-node to find plugin

Before the new plugin will be found and executed, it’s necessary to restart munin-node. If the autoconfig returns yes data collection will start automatically.

ubuntu@munin-dup:~$ sudo service munin-node restart
munin-node stop/waiting
munin-node start/running, process 4684

It’s possible to view the cached values using the LRANGE command without disturbing their collection. Recall that calling fetch will remove them from the queue, so you want to leave Munin to call that.

ubuntu@munin-dup:~$ redis-cli LRANGE load_dynamic.cache 0 -1
1) "load.value 1426910287:0.13"
2) "load.value 1426910292:0.12"
3) "load.value 1426910297:0.11"
4) "load.value 1426910302:0.10"

That’s it. Now you have a Munin plugin with resolution down to the second.

Software Engineering

Increase Munin Resolution to One Minute

I’ve recently been conducting some performance testing of a PaaS solution. In an effot to capture specific details resulting from these performance tests and how the test systems hold up under load, I looked into a few monitoring tools. Munin caught my eye as being easy to setup and having a large set of data available out of the box. The plugin architecture also made it attractive.

One major drawback to Munin was the five minute resolution. During performance tests, it’s necessary to capture data much more frequently. With the latest Munin, it’s possible to easily get down to one minute resolution with only a couple of minor changes.

One minute sampling

Since Munin runs on a cron job, the first step to increase the sampling resolution is to run the cron job more frequently. There are two cron jobs to consider.


the /etc/cron.d/munin-node cron runs the plugins that actually generate datapoints and store them in RRD files. In order to increase the amount of data collected, modify this file to sample more frequently (up to once a minute).


The /etc/cron.d/munin cron runs the Munin core process which builds the HTML and generates the graphs. Running this cron more frequently will not increase the number of data samples collected. However, if you do modify the munin-node cron to collect more samples, you may want to update this cron so that data is visible as it is collected.

Change plugin config to capture more data

The plugin config must be modified to include (or change) two arguments.

  • update_rate: this modifies the step for the rrd file
  • graph_data_size: this sets the rrd file size to accommodate the desired amount of data

To sample at a rate of once per minute, the config would need to return the following along with other values.

graph_data_size custom 1d, 1m for 1w, 5m for 1t, 15m for 1y
update_rate 60

Historical Data

After changing the rrd step using update_rate and redefining how much data to capture using graph_data_size, the existing rrd file will no longer accommodate the new data. If the historical data is not needed, the rrd files can be deleted from /var/lib/munin/<domain>. New files with the desired step and amount of data will be created when update is run next.

If the historical data is needed, it may be possible to reconfigure the existing data files, but I haven’t personally tried this.

Software Engineering

Understanding Munin Plugins

Munin is a monitoring tool which captures and graphs system data, such as CPU utilization, load and I/O. Munin is designed so that all data is collected by plugins. This means that every built in graph is a plugin that was included with the Munin distribution. Each plugin adheres to the interface (not a literal OO inteface), as shown below.


Munin uses Round Robin Database files (.rrd) to store captured data. The default time configuration in Munin collects data in five minute increments.

Some important details:

  • Plugins can be written in any language, including shell scripts, interpreted languages and even compiled languages like C.
  • Plugin output prints to stdout.
  • When the plugin is called with no arguments or with fetch, the output should be the actual data from the monitor
  • The config function defines the characteristics of the graph that is produced.


Each Munin plugin is expected to respond to a config call. The output of config is a list of name/value pairs, one per line, a space separating name and value. Config values are further separated into two subsets of configuration data. One defines global properties of the graph while the other defines the details for each datapoint defined for the graph. A reference describing these items is available here: http://munin.readthedocs.org/en/latest/reference/plugin.html.

Here’s the config output from the load plugin. Several graph properties are defined followed by a definition of the single datapoint, “load”.

ubuntu@munin:~$ sudo munin-run load config
graph_title load_dynamic
graph_title Load average
graph_args --base 1000 -l 0
graph_vlabel load
graph_scale no
graph_category system
graph_info The load average of the machine describes how many processes are in the run-queue (scheduled to run "immediately").
load.info 5 minute load average
load.label load

One graph parameter which is not included on the plugin reference above is graph_data_size. This parameter makes it possible to change the size and structure of the RRD file to capture and retain data at different levels of granularity (default is normal, which captures data at a five minute interval).


The autoconf should return “yes” or “no” indicating whether the plugin should be activated by default. This part of the script might check to ensure that a required library or specific service is installed on a system to determine whether it should be enabled.

fetch or no argument

This should return the actual values, either in real-time or as cached since the last fetch. If a plugin is called without any arguments, the plugin should return the same as if fetch were called.

Install plugin

Plugins are typically kept in /usr/share/munin/plugins. A symlink is then created from /etc/munin/plugins pointing to the plugin.

Running plugins

It’s possible to run a plugin manually using munin-run.

ubuntu@munin:~$ sudo munin-run load
load.value 0.01


The Munin wiki contains an example, so for now I’ll link to that rather than create my own simple example.




Software Engineering

MongoDB monitoring with mongotop

In the process of tuning the performance of a MongoDB replica set, it’s useful to be able to observe mongod directly, as opposed to inferring what it’s doing by watching the output of top, for example. For that reason MongoDB comes with a utility, mongotop.

The output of mongotop indicates the amount of time the mongod process spend reading and writing to a specific collection during the update interval. I used the following command to run mongotop on an authentication enabled replica set with a two second interval.

[watrous@d1t0156g ~]# mongotop -p -u admin 2
connected to:
Enter password:
                              ns       total        read       write           2013-01-11T23:41:51
                          admin.         0ms         0ms         0ms
            admin.system.indexes         0ms         0ms         0ms
         admin.system.namespaces         0ms         0ms         0ms
              admin.system.users         0ms         0ms         0ms
 coursetracker.system.namespaces         0ms         0ms         0ms
document_queue.system.namespaces         0ms         0ms         0ms

The output doesn’t refresh in the same way top does. Instead it aggregates, similar to running tail -f. When I began my experiment I could immediately see the resulting load:

                              ns       total        read       write           2013-01-11T23:41:19
                   documents.nav        60ms        60ms         0ms
               documents.product        53ms        53ms         0ms
                          admin.         0ms         0ms         0ms
            admin.system.indexes         0ms         0ms         0ms
         admin.system.namespaces         0ms         0ms         0ms
              admin.system.users         0ms         0ms         0ms
 coursetracker.system.namespaces         0ms         0ms         0ms
                              ns       total        read       write           2013-01-11T23:41:21
                   documents.nav        82ms        82ms         0ms
               documents.product        54ms        54ms         0ms
                          admin.         0ms         0ms         0ms
            admin.system.indexes         0ms         0ms         0ms
         admin.system.namespaces         0ms         0ms         0ms
              admin.system.users         0ms         0ms         0ms
 coursetracker.system.namespaces         0ms         0ms         0ms
                              ns       total        read       write           2013-01-11T23:41:23
                   documents.nav        63ms        63ms         0ms
               documents.product        45ms        45ms         0ms
                          admin.         0ms         0ms         0ms
            admin.system.indexes         0ms         0ms         0ms
         admin.system.namespaces         0ms         0ms         0ms
              admin.system.users         0ms         0ms         0ms
 coursetracker.system.namespaces         0ms         0ms         0ms

A related performance utility is mongostat.

Verified load balancing

Before running the experiment I set the ReadPreference to nearest. As a restult I expected to see a well balanced, but asymmetrical distribution between nodes in my replica set with all hosts responding to queries. That’s exactly what I saw.

Software Engineering

Lightweight Replication Monitoring with MongoDB

One of my applications runs on a large assortment of hosts split between various data centers. Some of these are redundant pairs and others are in load balanced clusters. They all require a set of identical files which represent static content and other data.

rsync was chosen to facilitate replication of data from a source to many targets. What rsync lacked out of the box was a reporting mechanism to verify that the collection of files across target systems was consistent with the source.

Existing solutions

Before designing my solution, I searched for an existing solution to the same problem. I found backup monitor, but the last release was three years ago and there have only ever been 25 downloads, so it was less than compelling. It was also a heavier solution than I was interested in.

In this case it seems that developing a new solution is appropriate.

Monitoring requirements

The goal was to have each target system run a lightweight process at scheduled intervals and send a report to an aggregator service. A report could then be generated based on the aggregated data.

My solution includes a few components. One component analyzes the current state of files on disk and writes that state to a state file. Another component needs to read that file and transmit it to the aggregator. The aggregator needs to store the state identified by the host to which it corresponds. Finally there needs to be a reporting mechanism to display the data for all hosts.

Due to the distributed nature of the replication targets, the solution should be centralized so that changes in reporting structure are picked up by target hosts quickly and with minimal effort.

Current disk state analysis

This component potentially analyses many hundreds of thousands of files. That means the solution for this component must run very fast and be reliable. The speed requirements for this component eliminated some of the scripting languages that might otherwise be appealing (e.g. Perl, Python, etc.).

Instead I chose to write this as a bash script and make use of existing system utilities. The utilities I use include du, find and wc. Here’s what I came up with:

# Generate a report showing the sizes 
# and file counts of replicated folders
# create a path reference to the report file
BASEDIR="$( cd "$( dirname "$0" )" && pwd )"
# create/overwrite report the file; write date
date '+%Y-%m-%d %H:%M:%S' > $reportfile
# append details to report file
du -sh /path/to/replicated/files/* | while read size dir;
    echo -n "$size ";
    # augment du output with count of files in the directory
    echo -n `find "$dir" -type f|wc -l`;
    echo " $dir ";
done >> $reportfile

These commands run very fast and produce an output that looks like this:

2012-08-06 21:45:10
4.5M 101 /path/to/replicated/files/style
24M 2002 /path/to/replicated/files/html
6.7G 477505 /path/to/replicated/files/images
761M 1 /path/to/replicated/files/transfer.tgz
30G 216 /path/to/replicated/files/data

Notice that the file output is space and newline delimited. It’s not great for human readability, but you’ll see in a minute that with regular expressions it’s super easy to build a report to send to the aggregator.

Read state and transmit to aggregator

Now that we have a report cached describing our current disk state, we need to format that properly and send it to the aggregator. To do this, Python seemed a good fit.

But first, I needed to be able to quickly and reliably extract information from this plain text report. Regular expressions seemed like a great fit for this, so I used my favorite regex tool, Kodos. The two expressions I need are to extract the date and then each line of report data.

You can see what I came up with in the Python script below.

# Name:        spacereport
# Purpose:     run spacereport.sh and report the results to a central service
#!/usr/bin/env python
import os
import re
import urllib, urllib2
from datetime import datetime
from socket import gethostname
def main():
    # where to send the report
    url = 'http://example.com/spacereport.php'
    # define regular expression(s)
    regexp_size_directory = re.compile(r"""([0-9.KGM]*)\s*([0-9]*)\s*[a-zA-Z/]*/(.+)""",  re.MULTILINE)
    regexp_report_time = re.compile(r"""^([0-9]{4}-[0-9]{2}-[0-9]{2}\s+[0-9]{2}:[0-9]{2}:[0-9]{2})\n""")
    # run the spacereport.sh script to generate plain text report
    base_dir = os.path.dirname(os.path.realpath(__file__))
    os.system(os.path.join(base_dir, 'spacereport.sh'))
    # parse space data from file
    spacedata = open(os.path.join(base_dir, 'spacereport')).read()
    space_report_time = regexp_report_time.search(spacedata).group(1)
    space_data_directories = regexp_size_directory.findall(spacedata)
    # create space data transmission
    report_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    hostname = gethostname()
    space_data = {'host': hostname, 'report_time': space_report_time, 'report_time_sent': report_time, 'space_data_by_directory': []}
    for space_data_directory in space_data_directories:
        space_data['space_data_by_directory'].append({'size_on_disk': space_data_directory[0], 'files_on_disk': space_data_directory[1], 'directory': space_data_directory[2]})
    # prepare the report
    # it might be better to use the json library for this :)
    report = {'report': str(space_data).replace("'", "\"")}
    # encode and send the report
    data = urllib.urlencode(report)
    req = urllib2.Request(url, data)
    response = urllib2.urlopen(req)
    # You can optionally output the response to verify that it worked
    the_page = response.read()
if __name__ == '__main__':

The variable report contains a value similar to what’s shown below:

{'report': '{"report_time": "2012-08-06 21:45:10", "host": "WATROUS1", "space_data_by_directory": [{"files_on_disk": "101", "directory": "style", "size_on_disk": "4.5M"}, {"files_on_disk": "2002", "directory": "html", "size_on_disk": "24M"}, {"files_on_disk": "477505", "directory": "images", "size_on_disk": "6.7G"}, {"files_on_disk": "1", "directory": "transfer.tgz", "size_on_disk": "761M"}, {"files_on_disk": "216", "directory": "data", "size_on_disk": "30G"}], "report_time_sent": "2012-08-06 16:20:53"}'}

The difference between report_time and report_time_sent, if there is a difference, is important. It can alert you to an error on the system preventing a new, valid report from being created by your shell script. It can also signal load issues if the gap is too wide.

Capture and store state data

Now we need to create spacereport.php, the aggregation script. It’s sole job is to receive the report and store it in MongoDB. This is made easy using PHP’s built in json_decode and MongoDB support. After the call to json_decode, the dates still need to be converted to MongoDates.

  // decode JSON report and convert dates to MongoDate
  $spacereport = json_decode($_POST['report'], true);
  $spacereport['report_time_sent'] = new MongoDate(strtotime($spacereport['report_time_sent']));
  $spacereport['report_time'] = new MongoDate(strtotime($spacereport['report_time']));
  // connect to MongoDB
  $mongoConnection = new Mongo("m1.example.com,m2.example.com", array("replicaSet" => "replicasetname"));
  // select a database
  $db = $mongoConnection->reports;
  // select a collection
  $collection = $db->spacereport;
  // add a record
} else {
  // this should probably set the STATUS to 405 Method Not Allowed
  echo 'not POST';

At this point, the data is now available in MongoDB. It’s possible to use Mongo’s query mechanism to query the data.

PRIMARY> db.spacereport.find({host: 't1.example.com'}).limit(1).pretty()
        "_id" : ObjectId("501fb1d47540c4df76000073"),
        "report_time" : ISODate("2012-08-06T12:00:11Z"),
        "host" : "t1.example.com",
        "space_data_by_directory" : [
                        "files_on_disk" : "101",
                        "directory" : "style ",
                        "size_on_disk" : "4.5M"
                        "files_on_disk" : "2001",
                        "directory" : "html ",
                        "size_on_disk" : "24M"
                        "files_on_disk" : "477505",
                        "directory" : "directory ",
                        "size_on_disk" : "6.7G"
                        "files_on_disk" : "1",
                        "directory" : "transfer.tgz ",
                        "size_on_disk" : "761M"
                        "files_on_disk" : "215",
                        "directory" : "data ",
                        "size_on_disk" : "30G"
        "report_time_sent" : ISODate("2012-08-06T12:00:20Z")

NOTE: For this system, historical data quickly diminishes in importance since what I’m interested in is the current state of replication. For that reason I made the spacereport collection capped to 1MB or 100 records.

PRIMARY> db.runCommand({"convertToCapped": "spacereport", size: 1045876, max: 100});
{ "ok" : 1 }

Display state data report

It’s not very useful to look at the data one record at a time, so we need some way of viewing the data as a whole. PHP is convenient, so we’ll use that to create a web based report.

body {
    font-family: Arial, Helvetica, Sans serif;
.directoryname {
    float: left;
    width: 350px;
.sizeondisk {
    float: left;
    text-align: right;
    width: 150px;
.numberoffiles {
    float: left;
    text-align: right;
    width: 150px;
$host_display_template = "<hr />\n<strong>%s</strong> showing report at <em>%s</em> (of %d total reports)<br />\n";
$spacedata_row_template = "<div class='directoryname'>%s</div> <div class='sizeondisk'>%s</div> <div class='numberoffiles'>%d total files</div><div style='clear: both;'></div>\n";
$mongoConnection = new Mongo("m1.example.com,m2.example.com", array("replicaSet" => "replicasetname"));
// select a database
$db = $mongoConnection->reports;
// select a collection
$collection = $db->spacereport;
// group the collection to get a unique list of all hosts reporting space data
$key = array("host" => 1);
$initial = array("reports" => 0);
$reduce = "function(obj, prev) {prev.reports++;}";
$reports_by_host = $collection->group($key, $initial, $reduce);
// cycle through all hosts found above
foreach ($reports_by_host['retval'] as $report_by_host) {
    // grab the reports for this host and sort to find most recent report
    $cursor = $collection->find(array("host" => $report_by_host['host']));
    $cursor->sort(array("report_time_sent" => -1))->limit(1);
    foreach ($cursor as $obj) {
        // output details about this host and the report timing
        printf ($host_display_template, $report_by_host['host'], date('M-d-Y H:i:s', $obj['report_time']->sec), $report_by_host['reports']);
        foreach ($obj["space_data_by_directory"] as $directory) {
            // output details about this directory
            printf ($spacedata_row_template, $directory["directory"], $directory["size_on_disk"], $directory["files_on_disk"]);

Centralizing the service

At this point the entire reporting structure is in place, but it requires manual installation or updates on each host where it runs. Even if you only have a handful of hosts, it can quickly become a pain to have to update them by hand each time you can to change the structure.

To get around this, host the two scripts responsible for creating and sending the report in some location that’s accessible to the target host. Then run the report from a third script that grabs the latest copies of those scripts and run the reports.

# download the latest spacereport scripts 
# and run to update central aggregation point
BASEDIR=$(dirname $0)
# use wget to grab the latest scripts
wget -q http://c.example.com/spacereport.py -O $BASEDIR/spacereport.py
wget -q http://c.example.com/spacereport.sh -O $BASEDIR/spacereport.sh
# make sure that spacereport.sh is executable
chmod +x $BASEDIR/spacereport.sh
# create and send a spacereport
python $BASEDIR/spacereport.py

Since MongoDB is schemaless, the structure of the reports can be changed at will. Provided legacy values are left in place, no changes are required at any other point in the reporting process.

Future Enhancements

Possible enhancements could include active monitoring, such as allowing an administrator to define rules that would trigger notifications based on the data being aggregated. This monitoring could be implemented as a hook in the spacereport.php aggregator script or based on a cron. Rules could include comparisons between hosts, self comparison with historical data for the same host, or comparison to baseline data external to the hosts being monitored.

Some refactoring to generalize the shell script that produces the initial plain text report may improve efficiency and/or flexibility, though it’s difficult to imagine that writing a custom program to replace existing system utilities would be worthwhile.

The ubiquity of support for MongoDB and JSON would make it easy to reimplement any of the above components in other languages if there’s a better fit for a certain project.