Increase Munin Resolution to sub-minute

March 20, 2015

Software Engineering

No Comments

Daniel Watrous

I previously explained how to get one-minute resolution in Munin. The process to get sub-minute resolution in Munin is more tricky. The main reason it’s more tricky is that cron only runs once per minute, which means data must be generated and cached in between cron runs for collection when cron runs.

In the case where a single datapoint is collected each time cron runs, the time at which cron runs is sufficient to store the data in rrd. With multiple datapoints being collected on a single cron run, it’s necessary to embed a timestamp with each datapoint so the datapoints can be properly stored in rrd.

For example, the load plugin which produces this for a one minute or greater collection time:

load.value 0.00

Would need to produce output like this for a five (5) second collection time:

load.value 1426889886:0.00
load.value 1426889891:0.00
load.value 1426889896:0.00
load.value 1426889901:0.00
load.value 1426889906:0.00
load.value 1426889911:0.00
load.value 1426889916:0.00
load.value 1426889921:0.00
load.value 1426889926:0.00
load.value 1426889931:0.00
load.value 1426889936:0.00
load.value 1426889941:0.00

Caching mechanism

In one example implementation of a one second collection rate, a datafile is either appended to or replaced using standard Linux file management mechanisms.

This looks a lot like a message queue problem. Each message needs to be published to the queue. The call to fetch is like the subscriber who then pulls all available messages, which empties the queue from the cache window.

Acquire must be long running

In the case of one minute resolution, the data can be generated at the moment it is collected. This means the process run by cron is sufficient to generate the desired data and can die after the data is output. For sub-minute resolution, a separate long running process is required to generate and cache the data. There are a couple of ways to accomplish this.

Start a process that will only run util the next cron runs. This would be started each time cron fetched the data
Create a daemon process that will produce a stream of data.

A possible pitfall with #2 above is that it would continue producing data even if the collection cron was failing. Option #1 results in more total processes being started.

Example using Redis

Redis is a very fast key/value datastore that runs natively on Linux. The following example shows how to use a bash script based plugin with Redis as the cache between cron runs. I can install Redis on Ubuntu using apt-get.

sudo apt-get install -y redis-server

And here is the plugin.

#!/bin/bash
# (c) 2015  - Daniel Watrous
 
update_rate=5                   # sampling interval in seconds
cron_cycle=1                    # time between cron runs in minutes
pluginfull="$0"                 # full name of plugin
plugin="${0##*/}"               # name of plugin
redis_cache="$plugin.cache"
graph="$plugin"
section="system:load"
style="LINE"
 
run_acquire() {
   while [ "true" ]
   do
      sleep $update_rate
      datapoint="$(cat /proc/loadavg | awk '{print "load.value " systime() ":" $1}')"
      redis-cli RPUSH $redis_cache "$datapoint"
   done
}
 
# --------------------------------------------------------------------------
 
run_autoconf() {
   echo "yes"
}
 
run_config() {
cat << EOF
graph_title $graph
graph_category $section
graph_vlabel System Load
graph_scale no
update_rate $update_rate
graph_data_size custom 1d, 10s for 1w, 1m for 1t, 5m for 1y
load.label load
load.draw $style
style=STACK
EOF
}
 
run_fetch()  {
   timeout_calc=$(expr $cron_cycle \* 60 + 5)
   timeout $timeout_calc $pluginfull acquire >/dev/null &
   while [ "true" ]
   do
     datapoint="$(redis-cli LPOP $redis_cache)"
     if [ "$datapoint" = "" ]; then
       break
     fi
     echo $datapoint
   done
}
 
 
run_${1:-fetch}
exit 0

Restart munin-node to find plugin

Before the new plugin will be found and executed, it’s necessary to restart munin-node. If the autoconfig returns yes data collection will start automatically.

ubuntu@munin-dup:~$ sudo service munin-node restart
munin-node stop/waiting
munin-node start/running, process 4684

It’s possible to view the cached values using the LRANGE command without disturbing their collection. Recall that calling fetch will remove them from the queue, so you want to leave Munin to call that.

ubuntu@munin-dup:~$ redis-cli LRANGE load_dynamic.cache 0 -1
1) "load.value 1426910287:0.13"
2) "load.value 1426910292:0.12"
3) "load.value 1426910297:0.11"
4) "load.value 1426910302:0.10"

That’s it. Now you have a Munin plugin with resolution down to the second.

PRV POST

NXT POST