Daniel Watrous on Software Engineering

A Collection of Software Problems and Solutions

Posts tagged php

Software Engineering

Install Stackato (CloudFoundry) on HPCloud

I recently published an article to get CloudFoundry running by way of Stackato on a local machine using VirtualBox. As soon as you have something ready to share with the world (testers, executives, investors, etc.), you’ll want something more public. Fortunately, it’s easy to run Stackato on HPCloud.com. I’m following the steps outlined here: https://docs.stackato.com/admin/server/hpcs.html.

Configuration of HPCloud.com

For the security groups, I created two separate groups, one for SSH and another for web. I do this to allow for separation of access and web service functions in the future (using a Bastian host for example).


Launch an Instance

My settings for the instance are straight forward and shown below. I’m using a medium size instance and a I check both the web and remote security groups that I created above. If you haven’t already added a key pair for secure remote access, do that before creating the instance. If you forget to do this, the default password for the stackato user is ‘stackato’, at least until you complete the setup of the first admin user.


And the security group assignments.


I follow the instructions as provided to associate a floating IP address with the new instance and create a DNS entry for easy access to my new stackato installation. In my case I didn’t move DNS management to hpcloud.com. Instead I just added A and CNAME records where it was already being managed.

Role and name management

When I installed Stackato on VirtualBox previously, I had to rename the node to use wildcard DNS. In this case I need to rename it to use the domain name for which I created the A and CNAME records above.

For my scenario I ran the following two commands:

kato role remove mdns
kato node rename stackato.danielwatrous.com

You may see some errors about it being unable to resolve the default node name. You can safely ignore these. Restarting all the affected roles can take a few minutes.

Stackato uses HTTPS by default. However, if you are using the default certificate, you may have to bypass some security warnings in your browser.

Pushing code

You should be all set to push an app to the newly deployed Stackato instance. Let’s push a simple PHP app. Two files are required to push this using the stackato command line client. First is the PHP file, index.php. Next is the manifest yaml file. The manifest is required so that CloudFoundry can identify the correct buildpack to prepare the instance.



And a very basic manifest.yml file.

- name: test-php
  framework: php

The console session to deploy this simple app is shown below:

F:\cf-php-test>stackato target stackato.danielwatrous.com
Host redirects to: 'https://api.stackato.danielwatrous.com'
Successfully targeted to [https://api.stackato.danielwatrous.com]
Target:       https://api.stackato.danielwatrous.com
Organization: <none>
Space:        <none>
F:\cf-php-test>stackato login watrous
Attempting login to [https://api.stackato.danielwatrous.com]
Password: ********
Successfully logged into [https://api.stackato.danielwatrous.com]
Choosing the one available organization: "DanielWatrous.com"
Choosing the one available space: "Explore"
Target:       https://api.stackato.danielwatrous.com
Organization: DanielWatrous.com
Space:        Explore
No license installed.
Using 4G of 4G.
F:\cf-php-test>stackato push
Would you like to deploy from the current directory ?  [Yn]:
Using manifest file "manifest.yml"
Application Deployed URL [test-php.stackato.danielwatrous.com]:
Application Url:   https://test-php.stackato.danielwatrous.com
Enter Memory Reservation [256]: 20
Enter Disk Reservation [2048]: 1024
Creating Application [test-php] as [https://api.stackato.danielwatrous.com -> DanielWatrous.com -> Explore -> test-php] ... OK
  Map https://test-php.stackato.danielwatrous.com ... OK
Create services to bind to 'test-php' ?  [yN]:
Uploading Application [test-php] ...
  Checking for bad links ...  OK
  Copying to temp space ...  OK
  Checking for available resources ...  OK
  Processing resources ... OK
  Packing application ... OK
  Uploading (231) ...  OK
Push Status: OK
Starting Application [test-php] ...
stackato[dea_ng]: Staging application
stackato[fence]: Created Docker container
stackato[fence]: Prepared Docker container
stackato[cloud_controller_ng]: Updated app 'test-php' -- {"console"=>true, "state"=>"STARTED"}
staging: -----> Downloaded app package (4.0M)
staging: ****************************************************************************
staging: * Using the legacy buildpack to stage a 'php' framework application.
staging: *
staging: * Note that the legacy buildpack is a migration tool to provide backwards
staging: * compatibility while moving from Stackato 2.x to Stackato 3.0.  It is not
staging: * updated with new features beyond what Stackato 2.10.6 supplied.
staging: *
staging: * Please use a non-legacy buildpack for any new code developed for Stackato!
staging: ****************************************************************************
staging: end of staging
staging: -----> Uploading droplet (4.0M)
stackato[dea_ng]: Uploading droplet
stackato[dea_ng]: Completed uploading droplet
stackato[fence]: Destroyed Docker container
stackato[fence.0]: Created Docker container
stackato[fence.0]: Prepared Docker container
stackato[dea_ng.0]: Launching web process: /home/stackato/startup
app[stderr.0]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using Set the 'ServerName' directive globally to suppress this message
stackato[dea_ng.0]: Instance is ready
http://test-php.stackato.danielwatrous.com/ deployed

At which point I can load my PHP test app, which is just phpinfo().


Software Engineering

Use Docker to Build a LEMP Stack (Buildfile)

I’ve been reviewing Docker recently. As part of that review, I decided to build a LEMP stack in Docker. I use Vagrant to create an environment in which to run Docker. For this experiment I chose to create Buildfiles to create the Docker container images. I’ll be discussing the following files in this post.


Download the Docker LEMP files as a zip (docker-lemp.zip).

Spin up the Host System

I start with Vagrant to spin up a host system for my Docker containers. To do this I use the following files.


# -*- mode: ruby -*-
# vi: set ft=ruby :
# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "ubuntu/trusty64"
  config.vm.network "public_network"
  config.vm.provider "virtualbox" do |v|
    v.name = "docker LEMP"
    v.cpus = 1
    v.memory = 512
  config.vm.provision :shell, path: "bootstrap.sh"

I start with Ubuntu and keep the size small. I create a public network to make it easier to test my LEMP setup later on.


#!/usr/bin/env bash
# set proxy variables
#export http_proxy=http://proxy.example.com:8080
#export https_proxy=https://proxy.example.com:8080
# bootstrap ansible for convenience on the control box
apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9
sh -c "echo deb https://get.docker.io/ubuntu docker main > /etc/apt/sources.list.d/docker.list"
apt-get update
apt-get -y install lxc-docker
# need to add proxy specifically to docker config since it doesn't pick them up from the environment
#sed -i '$a export http_proxy=http://proxy.example.com:8080' /etc/default/docker
#sed -i '$a export https_proxy=https://proxy.example.com:8080' /etc/default/docker
# enable non-root use by vagrant user
groupadd docker
gpasswd -a vagrant docker
# restart to enable proxy
service docker restart

I’m working in a proxied environment, so I need to provide proxy details to the Vagrant host for subsequent Docker installation steps. Unfortunately Docker doesn’t key off the typical environment variables for proxy, so I have to define them explicitly in the docker configuration. This second proxy configuration allows Docker to download images from DockerHub. Finally I create a docker group and add the vagrant user to it so I don’t need sudo for docker commands.

At this point, it’s super easy to play around with Docker and you might enjoy going through the Docker User Guide. We’re ready to move on to create Docker container images for our LEMP stack.

Build the LEMP Stack

This LEMP stack will be split across two containers, one for MySQL and the other for Nginx+PHP. I build on Ubuntu as a base, which may increase the total size of the image in exchange for the ease of using Ubuntu.

MySQL Container

We’ll start with MySQL. Here’s the Dockerfile.


# LEMP stack as a docker container
FROM ubuntu:14.04
MAINTAINER Daniel Watrous <email>
#ENV http_proxy http://proxy.example.com:8080
#ENV https_proxy https://proxy.example.com:8080
RUN apt-get update
RUN apt-get -y upgrade
# seed database password
COPY mysqlpwdseed /root/mysqlpwdseed
RUN debconf-set-selections /root/mysqlpwdseed
RUN apt-get -y install mysql-server
RUN sed -i -e"s/^bind-address\s*=\s* =" /etc/mysql/my.cnf
RUN /usr/sbin/mysqld & \
    sleep 10s &&\
    echo "GRANT ALL ON *.* TO admin@'%' IDENTIFIED BY 'secret' WITH GRANT OPTION; FLUSH PRIVILEGES" | mysql -u root --password=secret &&\
    echo "create database test" | mysql -u root --password=secret
# persistence: http://txt.fliglio.com/2013/11/creating-a-mysql-docker-container/ 
CMD ["/usr/bin/mysqld_safe"]

Notice that I declare my proxy servers for the third time here. This one is so that the container can access package downloads. I then seed the root database passwords and proceed to install and configure MySQL. Before running CMD, I expose port 3306. Remember that this port will be exposed on the private network between Docker containers and is only accessible to linked containers when you start it the way I show below. Here’s the mysqlpwdseed file.


mysql-server mysql-server/root_password password secret
mysql-server mysql-server/root_password_again password secret

If you downloaded the zip file above and ran vagrant in the resulting directory, you should have the files above available in /vagrant/mysql. The following commands will build and start the MySQL container.

cd /vagrant/mysql
docker build -t "local/mysql:v1" .
docker images
docker run -d --name mysql local/mysql:v1
docker ps

At this point you should show a local image for MySQL and a running mysql container, as shown below.


Nginx Container

Now we’ll build the Nginx container with PHP baked in.


# LEMP stack as a docker container
FROM ubuntu:14.04
MAINTAINER Daniel Watrous <email>
ENV http_proxy http://proxy.example.com:8080
ENV https_proxy https://proxy.example.com:8080
# install nginx
RUN apt-get update
RUN apt-get -y upgrade
RUN apt-get -y install nginx
RUN echo "daemon off;" >> /etc/nginx/nginx.conf
RUN mv /etc/nginx/sites-available/default /etc/nginx/sites-available/default.bak
COPY default /etc/nginx/sites-available/default
# install PHP
RUN apt-get -y install php5-fpm php5-mysql
RUN sed -i s/\;cgi\.fix_pathinfo\s*\=\s*1/cgi.fix_pathinfo\=0/ /etc/php5/fpm/php.ini
# prepare php test scripts
RUN echo "<?php phpinfo(); ?>" > /usr/share/nginx/html/info.php
ADD wall.php /usr/share/nginx/html/wall.php
# add volumes for debug and file manipulation
VOLUME ["/var/log/", "/usr/share/nginx/html/"]
CMD service php5-fpm start && nginx

After setting the proxy, I install Nginx and update the configuration file. I then install PHP5 and update the php.ini file. I then use two different methods to create php files. If you’re deploying an actual application to a Docker container, you may not end up using either of these, but instead install a script that will grab your application from git or subversion.

Next I define two volumes. You’ll see shortly that this makes it straightforward to view logs and manage code for debug. Finally I expose port 80 for web traffic and the CMD references two commands using && to make sure both PHP and Nginx are running in the container.


server {
        listen 80 default_server;
        listen [::]:80 default_server ipv6only=on;
        root /usr/share/nginx/html;
        index index.php index.html index.htm;
        server_name localhost;
        location / {
                try_files $uri $uri/ =404;
        error_page 404 /404.html;
        error_page 500 502 503 504 /50x.html;
        location = /50x.html {
                root /usr/share/nginx/html;
        location ~ \.php$ {
                fastcgi_split_path_info ^(.+\.php)(/.+)$;
                fastcgi_pass unix:/var/run/php5-fpm.sock;
                fastcgi_index index.php;
                include fastcgi_params;

This is my default Nginx configuration file.


// database credentials (defined in group_vars/all)
$dbname = "test";
$dbuser = "admin";
$dbpass = "secret";
$dbhost = "mysql";
// query templates
$create_table = "CREATE TABLE IF NOT EXISTS `wall` (
   `id` int(11) unsigned NOT NULL auto_increment,
   `title` varchar(255) NOT NULL default '',
   `content` text NOT NULL default '',
   PRIMARY KEY  (`id`)
$select_wall = 'SELECT * FROM wall';
// Connect to and select database
$link = mysql_connect($dbhost, $dbuser, $dbpass)
    or die('Could not connect: ' . mysql_error());
echo "Connected successfully\n<br />\n";
mysql_select_db($dbname) or die('Could not select database');
// create table
$result = mysql_query($create_table) or die('Create Table failed: ' . mysql_error());
// handle new wall posts
if (isset($_POST["title"])) {
    $result = mysql_query("insert into wall (title, content) values ('".$_POST["title"]."', '".$_POST["content"]."')") or die('Create Table failed: ' . mysql_error());
// Performing SQL query
$result = mysql_query($select_wall) or die('Query failed: ' . mysql_error());
// Printing results in HTML
echo "<table>\n";
while ($line = mysql_fetch_array($result, MYSQL_ASSOC)) {
    echo "\t<tr>\n";
    foreach ($line as $col_value) {
        echo "\t\t<td>$col_value</td>\n";
    echo "\t</tr>\n";
echo "</table>\n";
// Free resultset
// Closing connection
<form method="post">
Title: <input type="text" name="title"><br />
Message: <textarea name="content"></textarea><br />
<input type="submit" value="Post to wall">

Wall is meant to be a simple example that shows both PHP and MySQL are working. The commands below will build this new image and start it as a linked container to the already running MySQL container.

cd /vagrant/nginx
docker build -t "local/nginx:v1" .
docker images
docker run -d -p 80:80 --link mysql:mysql --name nginx local/nginx:v1
docker ps

So that now your Docker environment looks like the image below with an image for MySQL and one for Nginx and two processes running that are linked together.


As shown in the image above, we have mapped port 80 on the container to port 80 on the host system. This means we can discover the IP address of our host (remember the public network in the vagrant file) and load the web page.


Working with Running Containers

True to the single responsibility container design of Docker, these running containers are only running their respective service(s). That means they aren’t running SSH (and they shouldn’t be). So how do we interact with them, such as viewing the log files on the Nginx server or connecting to MySQL? We use Linked containers that leverage the shared volumes or exposed ports.

Connect to MySQL

To connect to MySQL, create a new container from the same MySQL image and link it to the already running MySQL. Start that container with /bin/bash. You can then use the mysql binaries to connect. Notice that I identify the host as ‘mysql’. That’s because when I linked the containers, Docker added an alias in the /etc/hosts file of the new container that mapped the private Docker network IP to ‘mysql’.

docker run -i -t --link mysql:mysql local/mysql:v1 /bin/bash
mysql -u admin -h mysql test --password=secret


View Ngnix Logs

It’s just as easy to view the Nginx logs. In this case I start up a plain Ubuntu container using the –volumes-from and indicate the running nginx container. From bash I can easily navigate to the Nginx logs or the html directory.

docker run -i -t --volumes-from nginx ubuntu /bin/bash




Software Engineering

Build a Multi-server LEMP stack using Ansible

My objective in this post is to explore the use of Ansible to configure a multi-server LEMP stack. This builds on the preliminary work I did demonstrating how to use Vagrant to create an environment to run Ansible. You can follow this entire example on any Windows (or Linux) host.

Ansible only runs on Linux hosts, not Windows. As a result, I needed to provision one Linux host to act as Ansible controller. One aspect of Ansible that I wanted to explore is the ability to manage multiple hosts with different configurations. For this experiment, I provision two more Linux hosts, one to act as a database host and the other to function as an Nginx/PHP server for a complete LEMP stack. I created the diagram below to illustrate my setup.


There are two primary artifact categories for this experiement:

  • Vagrantfile to provision each host
  • Ansible playbook related files

Since there were more than a few Ansible playbook files, I chose to create a github repository rather than provide all the code here. You can clone/fork the files to run this experiment here:



Here is a list of the files you’ll find in that repository.

  • Vagrantfile
  • control.sh
  • lemp/group_vars/all
  • lemp/hosts
  • lemp/roles/common/handlers/main.yml
  • lemp/roles/common/tasks/main.yml
  • lemp/roles/database/handlers/main.yml
  • lemp/roles/database/tasks/main.yml
  • lemp/roles/web/handlers/main.yml
  • lemp/roles/web/tasks/main.yml
  • lemp/roles/web/templates/default
  • lemp/roles/web/templates/wall.php
  • lemp/site.yml

I do use a bootstrap shell script, control.sh, with Vagrant for the Ansible control server. It is necessary to install Ansible on the control server, but since Ansible doesn’t require an agent, there’s no need to bootstrap the other servers.

Playbook files

For each Ansible defined role there are three artifact categories.

  • handlers
  • tasks
  • templates

Handlers are named tasks that can be called or notified when Ansible detects other events. These are commonly used to trigger service restarts when configuration files change, as an example.

Tasks are the meat of the playbook. This lists out the steps to put a system into a desired state, including installing software, copying templates, registering and calling handlers, etc.

Configuration files, such as the nginx ‘default’ configuration in this case, can be stored in the templates folder and copied to the host using a task. Templates are helpful when a desired configuration differs significantly from a system default, this can be easier than updating individual lines in a file one at a time using lineinfile. The Ansible playbook files are in the following directory.


The site.yml file ties it all together by associating host groups with roles. You run the playbook like this.

ansible-playbook -i hosts site.yml

The example wall.php script should be accessible locally using the port 80->8080 mapping as or over port 80 on the external IP assigned to the web host. Here’s what you can expect to see.



I used the ansible examples repository on Github while putting this together. You may find it useful. For the specifics of installing LEMP on Ubuntu, I followed my Vagrant tutorial.

Software Engineering

Using Vagrant to build a LEMP stack

I may have just fallen in love with the tool Vagrant. Vagrant makes it possible to quickly create a virtual environment for development. It is different than cloning or snapshots in that it uses minimal base OSes and provides a provisioning mechanism to setup and configure the environment exactly the way you want for development. I love this for a few reasons:

  • All developers work in the exact same environment
  • Developers can get a new environment up in minutes
  • Developers don’t need to be experts at setting up the environment.
  • System details can be versioned and stored alongside code

This short tutorial below demonstrates how easy it is to build a LEMP stack using Vagrant.

Install VirtualBox

Vagrant is not a virtualization tool. Instead vagrant will leverage an existing provider of virtual compute resources, either local or remote. For example, Vagrant can be used to create a virtual environment on Amazon Web Services or locally using a tool like VirtualBox. For this tutorial, we’ll use VirtualBox. You can download and install VirtualBox from the official website.


Install Vagrant

Next, we install Vagrant. Downloads are freely available on their website.


For the remainder of this tutorial, I’m going to assume that you’ve been through the getting started training and are somewhat familiar with Vagrant.

Accommodate SSH Keys

UPDATE 6/26/2015: Vagrant introduced the unfortunate feature of producing a random key for each new VM as the default behavior. It’s possible to restore the original functionality (described below) and use the insecure key with the config.ssh.insert_key = false setting in a Vagrantfile.

Until (if ever) Vagrant defaults to using the insecure key, a system wide work around is to add a Vagrantfile to the local .vagrant.d folder which will add set this setting for all VMs (see Load Order and Merging), unless otherwise overridden. The Vagrant file can be as simple as this:

# -*- mode: ruby -*-
# vi: set ft=ruby :
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.ssh.insert_key = false

Vagrant creates an SSH key which it installs on guest hosts by default. This can be a huge time saver since it prevents the need for passwords. Since I use PuTTY on windows, I needed to convert the SSH key and save a PuTTY session to accommodate connections. Use PuTTYgen to do this.

  1. Open PuTTYgen
  2. Click “Load”
  3. Navigate to the file C:\Users\watrous\.vagrant.d\insecure_private_key

PuTTYgen shows a dialog saying that the import was successful and displays the details of the key, as shown here:


Click “Save private key”. You will be prompted about saving the key without a passphrase, which in this case is fine, since it’s just for local development. If you end up using Vagrant to create public instances, such as using Amazon Web Services, you should use a more secure connection method. Give the key a unique name, like C:\Users\watrous\.vagrant.d\insecure_private_key-putty.ppk and save.

Finally, create a saved PuTTY session to connect to new Vagrant instances. Here are some of my PuTTY settings:



The username may change if you choose a different base OS image from the vagrant cloud, but the settings shown above should work fine for this tutorial.

Get Ready to ‘vagrant up’

Create a directory where you can store the files Vagrant needs to spin up your environment. I’ll refer to this directory as VAGRANT_ENV.

To build a LEMP stack we need a few things. First is a Vagrantfile file where we identify the base OS, or box, ports, etc. This is a text file that follows Ruby language conventions. Create the file VAGRANT_ENV/Vagrantfile with the following contents:

# -*- mode: ruby -*-
# vi: set ft=ruby :
# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  # All Vagrant configuration is done here. The most common configuration
  # options are documented and commented below. For a complete reference,
  # please see the online documentation at vagrantup.com.
  # Every Vagrant virtual environment requires a box to build off of.
  config.vm.box = "ubuntu/trusty64"
  config.vm.provision :shell, path: "bootstrap.sh"
  config.vm.network :forwarded_port, host: 4567, guest: 80
  config.ssh.shell = "bash -c 'BASH_ENV=/etc/profile exec bash'"

This file chooses a 64 bit trusty version of Ubuntu, forwards port 4567 on the host machine to port 80 on the guest machine and identifies a bootstrap shell script, which I show next.

Create VAGRANT_ENV/bootstrap.sh with the following contents:

#!/usr/bin/env bash
#accommodate proxy environments
#export http_proxy=http://proxy.company.com:8080
#export https_proxy=https://proxy.company.com:8080
apt-get -y update
apt-get -y install nginx
debconf-set-selections <<< 'mysql-server mysql-server/root_password password secret'
debconf-set-selections <<< 'mysql-server mysql-server/root_password_again password secret'
apt-get -y install mysql-server
apt-get -y install php5-fpm php5-mysql
sed -i s/\;cgi\.fix_pathinfo\s*\=\s*1/cgi.fix_pathinfo\=0/ /etc/php5/fpm/php.ini
service php5-fpm restart
mv /etc/nginx/sites-available/default /etc/nginx/sites-available/default.bak
cp /vagrant/default /etc/nginx/sites-available/default
service nginx restart
echo "<?php phpinfo(); ?>" > /usr/share/nginx/html/info.php

This script executes a sequence of commands from the shell as root after provisioning the new server. This script must run without requiring user input. It also should accommodate any configuration changes and restarts necessary to get your environment ready to use.

More sophisticated tools like Ansible, Chef and Puppet can also be used.

you may have noticed that the above script expects a modified version of nginx’s default configuration. Create the file VAGRANT_ENV/default with the following contents:

server {
	listen 80 default_server;
	listen [::]:80 default_server ipv6only=on;
	root /usr/share/nginx/html;
	index index.php index.html index.htm;
	server_name localhost;
	location / {
		try_files $uri $uri/ =404;
	error_page 404 /404.html;
	error_page 500 502 503 504 /50x.html;
	location = /50x.html {
		root /usr/share/nginx/html;
	location ~ \.php$ {
		fastcgi_split_path_info ^(.+\.php)(/.+)$;
		fastcgi_pass unix:/var/run/php5-fpm.sock;
		fastcgi_index index.php;
		include fastcgi_params;

vagrant up

Now it’s time to run ‘vagrant up‘. To do this, open a console window and navigate to your VAGRANT_ENV directory, then run ‘vagrant up’.


If this is the first time you have run ‘vagrant up’, it may take a few minutes to download the ‘box’. Once it’s done, you should be ready to visit your PHP page rendered by nginx on a local virtual machine created and configured by Vagrant:

Software Engineering

Lightweight Replication Monitoring with MongoDB

One of my applications runs on a large assortment of hosts split between various data centers. Some of these are redundant pairs and others are in load balanced clusters. They all require a set of identical files which represent static content and other data.

rsync was chosen to facilitate replication of data from a source to many targets. What rsync lacked out of the box was a reporting mechanism to verify that the collection of files across target systems was consistent with the source.

Existing solutions

Before designing my solution, I searched for an existing solution to the same problem. I found backup monitor, but the last release was three years ago and there have only ever been 25 downloads, so it was less than compelling. It was also a heavier solution than I was interested in.

In this case it seems that developing a new solution is appropriate.

Monitoring requirements

The goal was to have each target system run a lightweight process at scheduled intervals and send a report to an aggregator service. A report could then be generated based on the aggregated data.

My solution includes a few components. One component analyzes the current state of files on disk and writes that state to a state file. Another component needs to read that file and transmit it to the aggregator. The aggregator needs to store the state identified by the host to which it corresponds. Finally there needs to be a reporting mechanism to display the data for all hosts.

Due to the distributed nature of the replication targets, the solution should be centralized so that changes in reporting structure are picked up by target hosts quickly and with minimal effort.

Current disk state analysis

This component potentially analyses many hundreds of thousands of files. That means the solution for this component must run very fast and be reliable. The speed requirements for this component eliminated some of the scripting languages that might otherwise be appealing (e.g. Perl, Python, etc.).

Instead I chose to write this as a bash script and make use of existing system utilities. The utilities I use include du, find and wc. Here’s what I came up with:

# Generate a report showing the sizes 
# and file counts of replicated folders
# create a path reference to the report file
BASEDIR="$( cd "$( dirname "$0" )" && pwd )"
# create/overwrite report the file; write date
date '+%Y-%m-%d %H:%M:%S' > $reportfile
# append details to report file
du -sh /path/to/replicated/files/* | while read size dir;
    echo -n "$size ";
    # augment du output with count of files in the directory
    echo -n `find "$dir" -type f|wc -l`;
    echo " $dir ";
done >> $reportfile

These commands run very fast and produce an output that looks like this:

2012-08-06 21:45:10
4.5M 101 /path/to/replicated/files/style
24M 2002 /path/to/replicated/files/html
6.7G 477505 /path/to/replicated/files/images
761M 1 /path/to/replicated/files/transfer.tgz
30G 216 /path/to/replicated/files/data

Notice that the file output is space and newline delimited. It’s not great for human readability, but you’ll see in a minute that with regular expressions it’s super easy to build a report to send to the aggregator.

Read state and transmit to aggregator

Now that we have a report cached describing our current disk state, we need to format that properly and send it to the aggregator. To do this, Python seemed a good fit.

But first, I needed to be able to quickly and reliably extract information from this plain text report. Regular expressions seemed like a great fit for this, so I used my favorite regex tool, Kodos. The two expressions I need are to extract the date and then each line of report data.

You can see what I came up with in the Python script below.

# Name:        spacereport
# Purpose:     run spacereport.sh and report the results to a central service
#!/usr/bin/env python
import os
import re
import urllib, urllib2
from datetime import datetime
from socket import gethostname
def main():
    # where to send the report
    url = 'http://example.com/spacereport.php'
    # define regular expression(s)
    regexp_size_directory = re.compile(r"""([0-9.KGM]*)\s*([0-9]*)\s*[a-zA-Z/]*/(.+)""",  re.MULTILINE)
    regexp_report_time = re.compile(r"""^([0-9]{4}-[0-9]{2}-[0-9]{2}\s+[0-9]{2}:[0-9]{2}:[0-9]{2})\n""")
    # run the spacereport.sh script to generate plain text report
    base_dir = os.path.dirname(os.path.realpath(__file__))
    os.system(os.path.join(base_dir, 'spacereport.sh'))
    # parse space data from file
    spacedata = open(os.path.join(base_dir, 'spacereport')).read()
    space_report_time = regexp_report_time.search(spacedata).group(1)
    space_data_directories = regexp_size_directory.findall(spacedata)
    # create space data transmission
    report_time = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
    hostname = gethostname()
    space_data = {'host': hostname, 'report_time': space_report_time, 'report_time_sent': report_time, 'space_data_by_directory': []}
    for space_data_directory in space_data_directories:
        space_data['space_data_by_directory'].append({'size_on_disk': space_data_directory[0], 'files_on_disk': space_data_directory[1], 'directory': space_data_directory[2]})
    # prepare the report
    # it might be better to use the json library for this :)
    report = {'report': str(space_data).replace("'", "\"")}
    # encode and send the report
    data = urllib.urlencode(report)
    req = urllib2.Request(url, data)
    response = urllib2.urlopen(req)
    # You can optionally output the response to verify that it worked
    the_page = response.read()
if __name__ == '__main__':

The variable report contains a value similar to what’s shown below:

{'report': '{"report_time": "2012-08-06 21:45:10", "host": "WATROUS1", "space_data_by_directory": [{"files_on_disk": "101", "directory": "style", "size_on_disk": "4.5M"}, {"files_on_disk": "2002", "directory": "html", "size_on_disk": "24M"}, {"files_on_disk": "477505", "directory": "images", "size_on_disk": "6.7G"}, {"files_on_disk": "1", "directory": "transfer.tgz", "size_on_disk": "761M"}, {"files_on_disk": "216", "directory": "data", "size_on_disk": "30G"}], "report_time_sent": "2012-08-06 16:20:53"}'}

The difference between report_time and report_time_sent, if there is a difference, is important. It can alert you to an error on the system preventing a new, valid report from being created by your shell script. It can also signal load issues if the gap is too wide.

Capture and store state data

Now we need to create spacereport.php, the aggregation script. It’s sole job is to receive the report and store it in MongoDB. This is made easy using PHP’s built in json_decode and MongoDB support. After the call to json_decode, the dates still need to be converted to MongoDates.

  // decode JSON report and convert dates to MongoDate
  $spacereport = json_decode($_POST['report'], true);
  $spacereport['report_time_sent'] = new MongoDate(strtotime($spacereport['report_time_sent']));
  $spacereport['report_time'] = new MongoDate(strtotime($spacereport['report_time']));
  // connect to MongoDB
  $mongoConnection = new Mongo("m1.example.com,m2.example.com", array("replicaSet" => "replicasetname"));
  // select a database
  $db = $mongoConnection->reports;
  // select a collection
  $collection = $db->spacereport;
  // add a record
} else {
  // this should probably set the STATUS to 405 Method Not Allowed
  echo 'not POST';

At this point, the data is now available in MongoDB. It’s possible to use Mongo’s query mechanism to query the data.

PRIMARY> db.spacereport.find({host: 't1.example.com'}).limit(1).pretty()
        "_id" : ObjectId("501fb1d47540c4df76000073"),
        "report_time" : ISODate("2012-08-06T12:00:11Z"),
        "host" : "t1.example.com",
        "space_data_by_directory" : [
                        "files_on_disk" : "101",
                        "directory" : "style ",
                        "size_on_disk" : "4.5M"
                        "files_on_disk" : "2001",
                        "directory" : "html ",
                        "size_on_disk" : "24M"
                        "files_on_disk" : "477505",
                        "directory" : "directory ",
                        "size_on_disk" : "6.7G"
                        "files_on_disk" : "1",
                        "directory" : "transfer.tgz ",
                        "size_on_disk" : "761M"
                        "files_on_disk" : "215",
                        "directory" : "data ",
                        "size_on_disk" : "30G"
        "report_time_sent" : ISODate("2012-08-06T12:00:20Z")

NOTE: For this system, historical data quickly diminishes in importance since what I’m interested in is the current state of replication. For that reason I made the spacereport collection capped to 1MB or 100 records.

PRIMARY> db.runCommand({"convertToCapped": "spacereport", size: 1045876, max: 100});
{ "ok" : 1 }

Display state data report

It’s not very useful to look at the data one record at a time, so we need some way of viewing the data as a whole. PHP is convenient, so we’ll use that to create a web based report.

body {
    font-family: Arial, Helvetica, Sans serif;
.directoryname {
    float: left;
    width: 350px;
.sizeondisk {
    float: left;
    text-align: right;
    width: 150px;
.numberoffiles {
    float: left;
    text-align: right;
    width: 150px;
$host_display_template = "<hr />\n<strong>%s</strong> showing report at <em>%s</em> (of %d total reports)<br />\n";
$spacedata_row_template = "<div class='directoryname'>%s</div> <div class='sizeondisk'>%s</div> <div class='numberoffiles'>%d total files</div><div style='clear: both;'></div>\n";
$mongoConnection = new Mongo("m1.example.com,m2.example.com", array("replicaSet" => "replicasetname"));
// select a database
$db = $mongoConnection->reports;
// select a collection
$collection = $db->spacereport;
// group the collection to get a unique list of all hosts reporting space data
$key = array("host" => 1);
$initial = array("reports" => 0);
$reduce = "function(obj, prev) {prev.reports++;}";
$reports_by_host = $collection->group($key, $initial, $reduce);
// cycle through all hosts found above
foreach ($reports_by_host['retval'] as $report_by_host) {
    // grab the reports for this host and sort to find most recent report
    $cursor = $collection->find(array("host" => $report_by_host['host']));
    $cursor->sort(array("report_time_sent" => -1))->limit(1);
    foreach ($cursor as $obj) {
        // output details about this host and the report timing
        printf ($host_display_template, $report_by_host['host'], date('M-d-Y H:i:s', $obj['report_time']->sec), $report_by_host['reports']);
        foreach ($obj["space_data_by_directory"] as $directory) {
            // output details about this directory
            printf ($spacedata_row_template, $directory["directory"], $directory["size_on_disk"], $directory["files_on_disk"]);

Centralizing the service

At this point the entire reporting structure is in place, but it requires manual installation or updates on each host where it runs. Even if you only have a handful of hosts, it can quickly become a pain to have to update them by hand each time you can to change the structure.

To get around this, host the two scripts responsible for creating and sending the report in some location that’s accessible to the target host. Then run the report from a third script that grabs the latest copies of those scripts and run the reports.

# download the latest spacereport scripts 
# and run to update central aggregation point
BASEDIR=$(dirname $0)
# use wget to grab the latest scripts
wget -q http://c.example.com/spacereport.py -O $BASEDIR/spacereport.py
wget -q http://c.example.com/spacereport.sh -O $BASEDIR/spacereport.sh
# make sure that spacereport.sh is executable
chmod +x $BASEDIR/spacereport.sh
# create and send a spacereport
python $BASEDIR/spacereport.py

Since MongoDB is schemaless, the structure of the reports can be changed at will. Provided legacy values are left in place, no changes are required at any other point in the reporting process.

Future Enhancements

Possible enhancements could include active monitoring, such as allowing an administrator to define rules that would trigger notifications based on the data being aggregated. This monitoring could be implemented as a hook in the spacereport.php aggregator script or based on a cron. Rules could include comparisons between hosts, self comparison with historical data for the same host, or comparison to baseline data external to the hosts being monitored.

Some refactoring to generalize the shell script that produces the initial plain text report may improve efficiency and/or flexibility, though it’s difficult to imagine that writing a custom program to replace existing system utilities would be worthwhile.

The ubiquity of support for MongoDB and JSON would make it easy to reimplement any of the above components in other languages if there’s a better fit for a certain project.

Software Engineering

PHP CodeSniffer and Mess Detecter in Netbeans

Code quality and structure becomes extremely significant as the number of developers on a project increases. It can also be helpful when the code is maintained or refactored infrequently. In both cases, it can reduce the time required to get your head back in the code.

Today I found a plugin for Netbeans that uses phpMD and PHP CodeSniffer to examine the code for a project and make recommendations to improve it. The feedback is visible in the tasks panel and will take you right to the place it suggests you change.

Here’s how I installed it on Windows using WAMPServer. I first installed PEAR on WAMPServer. Then I installed a few libraries using the PEAR installer as follows:

C:\wamp2.1e\bin\php\php5.2.11>pear install PHP_CodeSniffer
Unknown remote channel: pear.phpunit.de
Did not download optional dependencies: channel://pear.phpunit.de/PHP_Timer, use --alldeps to download automatically
pear/PHP_CodeSniffer can optionally use package "channel://pear.phpunit.de/PHP_Timer"
downloading PHP_CodeSniffer-1.3.2.tgz ...
Starting to download PHP_CodeSniffer-1.3.2.tgz (328,845 bytes)
....................................................................done: 328,845 bytes
install ok: channel://pear.php.net/PHP_CodeSniffer-1.3.2
C:\wamp2.1e\bin\php\php5.2.11>phpcs --version
PHP_CodeSniffer version 1.3.2 (stable) by Squiz Pty Ltd. (http://www.squiz.net)
C:\wamp2.1e\bin\php\php5.2.11>pear channel-discover pear.pdepend.org
Adding Channel "pear.pdepend.org" succeeded
Discovery of channel "pear.pdepend.org" succeeded
C:\wamp2.1e\bin\php\php5.2.11>pear remote-list -c pdepend
PACKAGE                            VERSION
PHP_CodeSniffer_Standards_PDepend2 1.0.0
PHP_Depend                         1.0.1
PHP_Depend_Log_Arbit               1.0.0
staticReflection                   1.0.0
C:\wamp2.1e\bin\php\php5.2.11>pear install pdepend/PHP_Depend
Did not download optional dependencies: pecl/imagick, use --alldeps to download automatically
pdepend/PHP_Depend can optionally use package "pecl/imagick" (version >= 2.2.0b2)
downloading PHP_Depend-1.0.1.tgz ...
Starting to download PHP_Depend-1.0.1.tgz (181,720 bytes)
......................................done: 181,720 bytes
install ok: channel://pear.pdepend.org/PHP_Depend-1.0.1
C:\wamp2.1e\bin\php\php5.2.11>pear channel-discover pear.phpmd.org
Adding Channel "pear.phpmd.org" succeeded
Discovery of channel "pear.phpmd.org" succeeded
C:\wamp2.1e\bin\php\php5.2.11>pear remote-list -c phpmd
PHP_PMD 1.3.0
C:\wamp2.1e\bin\php\php5.2.11>pear install phpmd/PHP_PMD
downloading PHP_PMD-1.3.0.tgz ...
Starting to download PHP_PMD-1.3.0.tgz (45,722 bytes)
.............done: 45,722 bytes
install ok: channel://pear.phpmd.org/PHP_PMD-1.3.0

You could also install phpMD and PHP CodeSniffer separately if you wanted to.

I then had to tell the Netbeans plugin where the batch files were. I did that by opening the Tools -> Options window and setting the phpMD and PHP CodeSniffer tabs as shown below.

At this point all of the output in my tasks view showed errors. After restarting Netbeans it all worked like a charm. Here’s what my output looked like.


That little process took me slightly longer than I would like to admit. At the end of it, most of the recommendations were about comment tags, white space and underscores for variables. It turns out I don’t really care much about those items, so I didn’t spend any time going through them.

It did identify a few variable names that it thought were too long, but then I prefer to be descriptive for the same reasons I mentioned at the beginning of this post, so I ignored those suggestions too.

Perhaps I’ll find other projects where this type of automated review gives me more actionable work, but not this time.

Software Engineering

Access Apache LDAP authentication details in PHP

Today I was working on a small web application that will run on a corporate intranet. There was an existing LDAP server and many existing web apps use the authentication details cached in the browser (Basic Authentication) to identify a user and determine access levels.

My application is written in PHP and I wanted to leverage this same mechanism to determine the current user and customize my application. Since my searches on Google didn’t pull up anything similar, I want to document what I did.

I did explore the possibility of using PHP’s LDAP library to perform the authentication but decided instead to use the basic authentication provided by Apache. I have two reasons for this. First is that most users are accustomed to this already and developers of other internal applications are familiar with this approach. Second is that authentication details are more easily cached for a long duration, which cuts down on reauthentication. In a trusted intranet environment this is very desirable.

Apache Configuration

To begin with, I installed and configured Apache on Ubuntu Linux. One of the modules that wasn’t enabled by default is authnz_ldap. I did notice that it was available so I ran this command to enable the module:

$ sudo a2enmod authnz_ldap
$ sudo /etc/init.d/apache2 restart

With this module installed I then needed to give it some details about my LDAP server and what paths I wanted protected by LDAP authentication. I accomplished this by adding a <Location> directive to the httpd.conf file. This is what my Location directive looked like:

<Location "/">
  AuthBasicProvider ldap
  AuthType Basic
  AuthzLDAPAuthoritative off
  AuthName "My Application"
  AuthLDAPURL "ldap://directory.example.com:389/DC=example,DC=com?sAMAccountName?sub?(objectClass=*)"
  AuthLDAPBindDN "CN=apache,CN=Users,DC=example,DC=com"
  AuthLDAPBindPassword hackme
  Require valid-user

After making this change another restart was necessary. At this point I reloaded a page that was protected and was able to authenticate as expected.

PHP Access to authentication

This is surprisingly easy (and surprisingly undocumented anywhere that I could find). PHP will automatically populate several $_SERVER superglobal keys with the authentication values cached in Apache. The key values are:


Later I found that you can also add additional values to AuthLDAPURL that will populate additional keys in your $_SERVER superglobal.

At this point you might choose to perform additional operations against the LDAP server using PHP’s library or simply use the available values to customize your intranet web application.