Daniel Watrous on Software Engineering

A Collection of Software Problems and Solutions

Posts tagged vagrant

Software Engineering

Deploy MongoDB using Ansible

I’ve recently had some people ask how I deploy MongoDB. For a while I used their excellent online tool to deploy and monitor my clusters. Unfortunately they changed direction and I couldn’t afford their new tools, so I turned to Ansible.

In order more easily share the process, I posted a simple example that you can run locally using Vagrant to deploy MongoDB using Ansible.

https://github.com/dwatrous/ansible-mongodb

As soon as you finish running the Ansible script, you can immediately connect to MongoDB and start working with Data.

ansible-mongodb

If you’re looking to learn more about MongoDB, checkout the videos I published with PackT Publishing on end to end MongoDB.

Software Engineering

Install and configure a Multi-node Hadoop cluster using Ansible

I’ve recently been involved with several groups interested in using Hadoop to process large sets of data, including use of higher level abstractions on top of Hadoop like Pig and Hive. What has surprised me most is that no one is automating their installation of Hadoop. In each case that I’ve observed they start by manually provisioning some servers and then follow a series of tutorials to manually install and configure a cluster. The typical experience seems to take about a week to setup a cluster. There is often a lot of wasted time to deal with networking and connectivity between hosts.

After telling several groups that they should automate the installation of Hadoop using something like Ansible, I decided to create an example. All the scripts to install a new Hadoop cluster in minutes are on github for you to fork: https://github.com/dwatrous/hadoop-multi-server-ansible

I have also recorded a video demonstration of the following process:

Scope

The scope of this article is to create a three node cluster on a single computer (Windows in my case) using VirtualBox and Vagrant. The cluster includes HDFS and mapreduce running on all three nodes. The following diagram will help to visualize the cluster.

hadoop-design

Build the servers

The first step is to install VirtualBox and Vagrant.

Clone hadoop-multi-server-ansible and open a console window to the directory where you cloned. The Vagrantfile defines three Ubuntu 14.04 servers. Each server needs 3GB RAM, so you’ll need to make sure you have enough RAM available. Now run vagrant up and wait a few minutes for the new servers to come up.

C:\Users\watrous\Documents\hadoop>vagrant up
Bringing machine 'master' up with 'virtualbox' provider...
Bringing machine 'data1' up with 'virtualbox' provider...
Bringing machine 'data2' up with 'virtualbox' provider...
==> master: Importing base box 'ubuntu/trusty64'...
==> master: Matching MAC address for NAT networking...
==> master: Checking if box 'ubuntu/trusty64' is up to date...
==> master: A newer version of the box 'ubuntu/trusty64' is available! You currently
==> master: have version '20150916.0.0'. The latest is version '20150924.0.0'. Run
==> master: `vagrant box update` to update.
==> master: Setting the name of the VM: master
==> master: Clearing any previously set forwarded ports...
==> master: Clearing any previously set network interfaces...
==> master: Preparing network interfaces based on configuration...
    master: Adapter 1: nat
    master: Adapter 2: hostonly
==> master: Forwarding ports...
    master: 22 => 2222 (adapter 1)
==> master: Running 'pre-boot' VM customizations...
==> master: Booting VM...
==> master: Waiting for machine to boot. This may take a few minutes...
    master: SSH address: 127.0.0.1:2222
    master: SSH username: vagrant
    master: SSH auth method: private key
    master: Warning: Connection timeout. Retrying...
==> master: Machine booted and ready!
==> master: Checking for guest additions in VM...
==> master: Setting hostname...
==> master: Configuring and enabling network interfaces...
==> master: Mounting shared folders...
    master: /home/vagrant/src => C:/Users/watrous/Documents/hadoop
==> master: Running provisioner: file...
==> master: Running provisioner: shell...
    master: Running: C:/Users/watrous/AppData/Local/Temp/vagrant-shell20150930-12444-1lgl5bq.sh
==> master: stdin: is not a tty
==> master: Ign http://archive.ubuntu.com trusty InRelease
==> master: Ign http://archive.ubuntu.com trusty-updates InRelease
==> master: Ign http://security.ubuntu.com trusty-security InRelease
==> master: Hit http://archive.ubuntu.com trusty Release.gpg
==> master: Get:1 http://security.ubuntu.com trusty-security Release.gpg [933 B]
==> master: Get:2 http://archive.ubuntu.com trusty-updates Release.gpg [933 B]
==> master: Hit http://archive.ubuntu.com trusty Release
==> master: Get:3 http://security.ubuntu.com trusty-security Release [63.5 kB]
==> master: Get:4 http://archive.ubuntu.com trusty-updates Release [63.5 kB]
==> master: Get:5 http://archive.ubuntu.com trusty/main Sources [1,064 kB]
==> master: Get:6 http://security.ubuntu.com trusty-security/main Sources [96.2 kB]
==> master: Get:7 http://security.ubuntu.com trusty-security/universe Sources [31.1 kB]
==> master: Get:8 http://security.ubuntu.com trusty-security/main amd64 Packages [350 kB]
==> master: Get:9 http://archive.ubuntu.com trusty/universe Sources [6,399 kB]
==> master: Get:10 http://security.ubuntu.com trusty-security/universe amd64 Packages [117 kB]
==> master: Get:11 http://security.ubuntu.com trusty-security/main Translation-en [191 kB]
==> master: Get:12 http://security.ubuntu.com trusty-security/universe Translation-en [68.2 kB]
==> master: Hit http://archive.ubuntu.com trusty/main amd64 Packages
==> master: Hit http://archive.ubuntu.com trusty/universe amd64 Packages
==> master: Hit http://archive.ubuntu.com trusty/main Translation-en
==> master: Hit http://archive.ubuntu.com trusty/universe Translation-en
==> master: Get:13 http://archive.ubuntu.com trusty-updates/main Sources [236 kB]
==> master: Get:14 http://archive.ubuntu.com trusty-updates/universe Sources [139 kB]
==> master: Get:15 http://archive.ubuntu.com trusty-updates/main amd64 Packages [626 kB]
==> master: Get:16 http://archive.ubuntu.com trusty-updates/universe amd64 Packages [320 kB]
==> master: Get:17 http://archive.ubuntu.com trusty-updates/main Translation-en [304 kB]
==> master: Get:18 http://archive.ubuntu.com trusty-updates/universe Translation-en [168 kB]
==> master: Ign http://archive.ubuntu.com trusty/main Translation-en_US
==> master: Ign http://archive.ubuntu.com trusty/universe Translation-en_US
==> master: Fetched 10.2 MB in 4s (2,098 kB/s)
==> master: Reading package lists...
==> master: Reading package lists...
==> master: Building dependency tree...
==> master:
==> master: Reading state information...
==> master: The following extra packages will be installed:
==> master:   build-essential dpkg-dev g++ g++-4.8 libalgorithm-diff-perl
==> master:   libalgorithm-diff-xs-perl libalgorithm-merge-perl libdpkg-perl libexpat1-dev
==> master:   libfile-fcntllock-perl libpython-dev libpython2.7-dev libstdc++-4.8-dev
==> master:   python-chardet-whl python-colorama python-colorama-whl python-distlib
==> master:   python-distlib-whl python-html5lib python-html5lib-whl python-pip-whl
==> master:   python-requests-whl python-setuptools python-setuptools-whl python-six-whl
==> master:   python-urllib3-whl python-wheel python2.7-dev python3-pkg-resources
==> master: Suggested packages:
==> master:   debian-keyring g++-multilib g++-4.8-multilib gcc-4.8-doc libstdc++6-4.8-dbg
==> master:   libstdc++-4.8-doc python-genshi python-lxml python3-setuptools zip
==> master: Recommended packages:
==> master:   python-dev-all
==> master: The following NEW packages will be installed:
==> master:   build-essential dpkg-dev g++ g++-4.8 libalgorithm-diff-perl
==> master:   libalgorithm-diff-xs-perl libalgorithm-merge-perl libdpkg-perl libexpat1-dev
==> master:   libfile-fcntllock-perl libpython-dev libpython2.7-dev libstdc++-4.8-dev
==> master:   python-chardet-whl python-colorama python-colorama-whl python-dev
==> master:   python-distlib python-distlib-whl python-html5lib python-html5lib-whl
==> master:   python-pip python-pip-whl python-requests-whl python-setuptools
==> master:   python-setuptools-whl python-six-whl python-urllib3-whl python-wheel
==> master:   python2.7-dev python3-pkg-resources unzip
==> master: 0 upgraded, 32 newly installed, 0 to remove and 29 not upgraded.
==> master: Need to get 41.3 MB of archives.
==> master: After this operation, 80.4 MB of additional disk space will be used.
==> master: Get:1 http://archive.ubuntu.com/ubuntu/ trusty-updates/main libexpat1-dev amd64 2.1.0-4ubuntu1.1 [115 kB]
==> master: Get:2 http://archive.ubuntu.com/ubuntu/ trusty-updates/main libpython2.7-dev amd64 2.7.6-8ubuntu0.2 [22.0 MB]
==> master: Get:3 http://archive.ubuntu.com/ubuntu/ trusty-updates/main libstdc++-4.8-dev amd64 4.8.4-2ubuntu1~14.04 [1,052 kB]
==> master: Get:4 http://archive.ubuntu.com/ubuntu/ trusty-updates/main g++-4.8 amd64 4.8.4-2ubuntu1~14.04 [15.0 MB]
==> master: Get:5 http://archive.ubuntu.com/ubuntu/ trusty/main g++ amd64 4:4.8.2-1ubuntu6 [1,490 B]
==> master: Get:6 http://archive.ubuntu.com/ubuntu/ trusty-updates/main libdpkg-perl all 1.17.5ubuntu5.4 [179 kB]
==> master: Get:7 http://archive.ubuntu.com/ubuntu/ trusty-updates/main dpkg-dev all 1.17.5ubuntu5.4 [726 kB]
==> master: Get:8 http://archive.ubuntu.com/ubuntu/ trusty/main build-essential amd64 11.6ubuntu6 [4,838 B]
==> master: Get:9 http://archive.ubuntu.com/ubuntu/ trusty/main libalgorithm-diff-perl all 1.19.02-3 [50.0 kB]
==> master: Get:10 http://archive.ubuntu.com/ubuntu/ trusty/main libalgorithm-diff-xs-perl amd64 0.04-2build4 [12.6 kB]
==> master: Get:11 http://archive.ubuntu.com/ubuntu/ trusty/main libalgorithm-merge-perl all 0.08-2 [12.7 kB]
==> master: Get:12 http://archive.ubuntu.com/ubuntu/ trusty/main libfile-fcntllock-perl amd64 0.14-2build1 [15.9 kB]
==> master: Get:13 http://archive.ubuntu.com/ubuntu/ trusty/main libpython-dev amd64 2.7.5-5ubuntu3 [7,078 B]
==> master: Get:14 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python3-pkg-resources all 3.3-1ubuntu2 [31.7 kB]
==> master: Get:15 http://archive.ubuntu.com/ubuntu/ trusty-updates/universe python-chardet-whl all 2.2.1-2~ubuntu1 [170 kB]
==> master: Get:16 http://archive.ubuntu.com/ubuntu/ trusty-updates/universe python-colorama all 0.2.5-0.1ubuntu2 [18.4 kB]
==> master: Get:17 http://archive.ubuntu.com/ubuntu/ trusty-updates/universe python-colorama-whl all 0.2.5-0.1ubuntu2 [18.2 kB]
==> master: Get:18 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python2.7-dev amd64 2.7.6-8ubuntu0.2 [269 kB]
==> master: Get:19 http://archive.ubuntu.com/ubuntu/ trusty/main python-dev amd64 2.7.5-5ubuntu3 [1,166 B]
==> master: Get:20 http://archive.ubuntu.com/ubuntu/ trusty-updates/universe python-distlib all 0.1.8-1ubuntu1 [113 kB]
==> master: Get:21 http://archive.ubuntu.com/ubuntu/ trusty-updates/universe python-distlib-whl all 0.1.8-1ubuntu1 [140 kB]
==> master: Get:22 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python-html5lib all 0.999-3~ubuntu1 [83.5 kB]
==> master: Get:23 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python-html5lib-whl all 0.999-3~ubuntu1 [109 kB]
==> master: Get:24 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python-six-whl all 1.5.2-1ubuntu1 [10.5 kB]
==> master: Get:25 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python-urllib3-whl all 1.7.1-1ubuntu3 [64.0 kB]
==> master: Get:26 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python-requests-whl all 2.2.1-1ubuntu0.3 [227
kB]
==> master: Get:27 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python-setuptools-whl all 3.3-1ubuntu2 [244 kB]
==> master: Get:28 http://archive.ubuntu.com/ubuntu/ trusty-updates/universe python-pip-whl all 1.5.4-1ubuntu3 [111 kB]
==> master: Get:29 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python-setuptools all 3.3-1ubuntu2 [230 kB]
==> master: Get:30 http://archive.ubuntu.com/ubuntu/ trusty-updates/universe python-pip all 1.5.4-1ubuntu3 [97.2 kB]
==> master: Get:31 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python-wheel all 0.24.0-1~ubuntu1 [44.7 kB]
==> master: Get:32 http://archive.ubuntu.com/ubuntu/ trusty-updates/main unzip amd64 6.0-9ubuntu1.3 [157 kB]
==> master: dpkg-preconfigure: unable to re-open stdin: No such file or directory
==> master: Fetched 41.3 MB in 20s (2,027 kB/s)
==> master: Selecting previously unselected package libexpat1-dev:amd64.
==> master: (Reading database ... 61002 files and directories currently installed.)
==> master: Preparing to unpack .../libexpat1-dev_2.1.0-4ubuntu1.1_amd64.deb ...
==> master: Unpacking libexpat1-dev:amd64 (2.1.0-4ubuntu1.1) ...
==> master: Selecting previously unselected package libpython2.7-dev:amd64.
==> master: Preparing to unpack .../libpython2.7-dev_2.7.6-8ubuntu0.2_amd64.deb ...
==> master: Unpacking libpython2.7-dev:amd64 (2.7.6-8ubuntu0.2) ...
==> master: Selecting previously unselected package libstdc++-4.8-dev:amd64.
==> master: Preparing to unpack .../libstdc++-4.8-dev_4.8.4-2ubuntu1~14.04_amd64.deb ...
==> master: Unpacking libstdc++-4.8-dev:amd64 (4.8.4-2ubuntu1~14.04) ...
==> master: Selecting previously unselected package g++-4.8.
==> master: Preparing to unpack .../g++-4.8_4.8.4-2ubuntu1~14.04_amd64.deb ...
==> master: Unpacking g++-4.8 (4.8.4-2ubuntu1~14.04) ...
==> master: Selecting previously unselected package g++.
==> master: Preparing to unpack .../g++_4%3a4.8.2-1ubuntu6_amd64.deb ...
==> master: Unpacking g++ (4:4.8.2-1ubuntu6) ...
==> master: Selecting previously unselected package libdpkg-perl.
==> master: Preparing to unpack .../libdpkg-perl_1.17.5ubuntu5.4_all.deb ...
==> master: Unpacking libdpkg-perl (1.17.5ubuntu5.4) ...
==> master: Selecting previously unselected package dpkg-dev.
==> master: Preparing to unpack .../dpkg-dev_1.17.5ubuntu5.4_all.deb ...
==> master: Unpacking dpkg-dev (1.17.5ubuntu5.4) ...
==> master: Selecting previously unselected package build-essential.
==> master: Preparing to unpack .../build-essential_11.6ubuntu6_amd64.deb ...
==> master: Unpacking build-essential (11.6ubuntu6) ...
==> master: Selecting previously unselected package libalgorithm-diff-perl.
==> master: Preparing to unpack .../libalgorithm-diff-perl_1.19.02-3_all.deb ...
==> master: Unpacking libalgorithm-diff-perl (1.19.02-3) ...
==> master: Selecting previously unselected package libalgorithm-diff-xs-perl.
==> master: Preparing to unpack .../libalgorithm-diff-xs-perl_0.04-2build4_amd64.deb ...
==> master: Unpacking libalgorithm-diff-xs-perl (0.04-2build4) ...
==> master: Selecting previously unselected package libalgorithm-merge-perl.
==> master: Preparing to unpack .../libalgorithm-merge-perl_0.08-2_all.deb ...
==> master: Unpacking libalgorithm-merge-perl (0.08-2) ...
==> master: Selecting previously unselected package libfile-fcntllock-perl.
==> master: Preparing to unpack .../libfile-fcntllock-perl_0.14-2build1_amd64.deb ...
==> master: Unpacking libfile-fcntllock-perl (0.14-2build1) ...
==> master: Selecting previously unselected package libpython-dev:amd64.
==> master: Preparing to unpack .../libpython-dev_2.7.5-5ubuntu3_amd64.deb ...
==> master: Unpacking libpython-dev:amd64 (2.7.5-5ubuntu3) ...
==> master: Selecting previously unselected package python3-pkg-resources.
==> master: Preparing to unpack .../python3-pkg-resources_3.3-1ubuntu2_all.deb ...
==> master: Unpacking python3-pkg-resources (3.3-1ubuntu2) ...
==> master: Selecting previously unselected package python-chardet-whl.
==> master: Preparing to unpack .../python-chardet-whl_2.2.1-2~ubuntu1_all.deb ...
==> master: Unpacking python-chardet-whl (2.2.1-2~ubuntu1) ...
==> master: Selecting previously unselected package python-colorama.
==> master: Preparing to unpack .../python-colorama_0.2.5-0.1ubuntu2_all.deb ...
==> master: Unpacking python-colorama (0.2.5-0.1ubuntu2) ...
==> master: Selecting previously unselected package python-colorama-whl.
==> master: Preparing to unpack .../python-colorama-whl_0.2.5-0.1ubuntu2_all.deb ...
==> master: Unpacking python-colorama-whl (0.2.5-0.1ubuntu2) ...
==> master: Selecting previously unselected package python2.7-dev.
==> master: Preparing to unpack .../python2.7-dev_2.7.6-8ubuntu0.2_amd64.deb ...
==> master: Unpacking python2.7-dev (2.7.6-8ubuntu0.2) ...
==> master: Selecting previously unselected package python-dev.
==> master: Preparing to unpack .../python-dev_2.7.5-5ubuntu3_amd64.deb ...
==> master: Unpacking python-dev (2.7.5-5ubuntu3) ...
==> master: Selecting previously unselected package python-distlib.
==> master: Preparing to unpack .../python-distlib_0.1.8-1ubuntu1_all.deb ...
==> master: Unpacking python-distlib (0.1.8-1ubuntu1) ...
==> master: Selecting previously unselected package python-distlib-whl.
==> master: Preparing to unpack .../python-distlib-whl_0.1.8-1ubuntu1_all.deb ...
==> master: Unpacking python-distlib-whl (0.1.8-1ubuntu1) ...
==> master: Selecting previously unselected package python-html5lib.
==> master: Preparing to unpack .../python-html5lib_0.999-3~ubuntu1_all.deb ...
==> master: Unpacking python-html5lib (0.999-3~ubuntu1) ...
==> master: Selecting previously unselected package python-html5lib-whl.
==> master: Preparing to unpack .../python-html5lib-whl_0.999-3~ubuntu1_all.deb ...
==> master: Unpacking python-html5lib-whl (0.999-3~ubuntu1) ...
==> master: Selecting previously unselected package python-six-whl.
==> master: Preparing to unpack .../python-six-whl_1.5.2-1ubuntu1_all.deb ...
==> master: Unpacking python-six-whl (1.5.2-1ubuntu1) ...
==> master: Selecting previously unselected package python-urllib3-whl.
==> master: Preparing to unpack .../python-urllib3-whl_1.7.1-1ubuntu3_all.deb ...
==> master: Unpacking python-urllib3-whl (1.7.1-1ubuntu3) ...
==> master: Selecting previously unselected package python-requests-whl.
==> master: Preparing to unpack .../python-requests-whl_2.2.1-1ubuntu0.3_all.deb ...
==> master: Unpacking python-requests-whl (2.2.1-1ubuntu0.3) ...
==> master: Selecting previously unselected package python-setuptools-whl.
==> master: Preparing to unpack .../python-setuptools-whl_3.3-1ubuntu2_all.deb ...
==> master: Unpacking python-setuptools-whl (3.3-1ubuntu2) ...
==> master: Selecting previously unselected package python-pip-whl.
==> master: Preparing to unpack .../python-pip-whl_1.5.4-1ubuntu3_all.deb ...
==> master: Unpacking python-pip-whl (1.5.4-1ubuntu3) ...
==> master: Selecting previously unselected package python-setuptools.
==> master: Preparing to unpack .../python-setuptools_3.3-1ubuntu2_all.deb ...
==> master: Unpacking python-setuptools (3.3-1ubuntu2) ...
==> master: Selecting previously unselected package python-pip.
==> master: Preparing to unpack .../python-pip_1.5.4-1ubuntu3_all.deb ...
==> master: Unpacking python-pip (1.5.4-1ubuntu3) ...
==> master: Selecting previously unselected package python-wheel.
==> master: Preparing to unpack .../python-wheel_0.24.0-1~ubuntu1_all.deb ...
==> master: Unpacking python-wheel (0.24.0-1~ubuntu1) ...
==> master: Selecting previously unselected package unzip.
==> master: Preparing to unpack .../unzip_6.0-9ubuntu1.3_amd64.deb ...
==> master: Unpacking unzip (6.0-9ubuntu1.3) ...
==> master: Processing triggers for man-db (2.6.7.1-1ubuntu1) ...
==> master: Processing triggers for mime-support (3.54ubuntu1.1) ...
==> master: Setting up libexpat1-dev:amd64 (2.1.0-4ubuntu1.1) ...
==> master: Setting up libpython2.7-dev:amd64 (2.7.6-8ubuntu0.2) ...
==> master: Setting up libstdc++-4.8-dev:amd64 (4.8.4-2ubuntu1~14.04) ...
==> master: Setting up g++-4.8 (4.8.4-2ubuntu1~14.04) ...
==> master: Setting up g++ (4:4.8.2-1ubuntu6) ...
==> master: update-alternatives: using /usr/bin/g++ to provide /usr/bin/c++ (c++) in auto mode
==> master: Setting up libdpkg-perl (1.17.5ubuntu5.4) ...
==> master: Setting up dpkg-dev (1.17.5ubuntu5.4) ...
==> master: Setting up build-essential (11.6ubuntu6) ...
==> master: Setting up libalgorithm-diff-perl (1.19.02-3) ...
==> master: Setting up libalgorithm-diff-xs-perl (0.04-2build4) ...
==> master: Setting up libalgorithm-merge-perl (0.08-2) ...
==> master: Setting up libfile-fcntllock-perl (0.14-2build1) ...
==> master: Setting up libpython-dev:amd64 (2.7.5-5ubuntu3) ...
==> master: Setting up python3-pkg-resources (3.3-1ubuntu2) ...
==> master: Setting up python-chardet-whl (2.2.1-2~ubuntu1) ...
==> master: Setting up python-colorama (0.2.5-0.1ubuntu2) ...
==> master: Setting up python-colorama-whl (0.2.5-0.1ubuntu2) ...
==> master: Setting up python2.7-dev (2.7.6-8ubuntu0.2) ...
==> master: Setting up python-dev (2.7.5-5ubuntu3) ...
==> master: Setting up python-distlib (0.1.8-1ubuntu1) ...
==> master: Setting up python-distlib-whl (0.1.8-1ubuntu1) ...
==> master: Setting up python-html5lib (0.999-3~ubuntu1) ...
==> master: Setting up python-html5lib-whl (0.999-3~ubuntu1) ...
==> master: Setting up python-six-whl (1.5.2-1ubuntu1) ...
==> master: Setting up python-urllib3-whl (1.7.1-1ubuntu3) ...
==> master: Setting up python-requests-whl (2.2.1-1ubuntu0.3) ...
==> master: Setting up python-setuptools-whl (3.3-1ubuntu2) ...
==> master: Setting up python-pip-whl (1.5.4-1ubuntu3) ...
==> master: Setting up python-setuptools (3.3-1ubuntu2) ...
==> master: Setting up python-pip (1.5.4-1ubuntu3) ...
==> master: Setting up python-wheel (0.24.0-1~ubuntu1) ...
==> master: Setting up unzip (6.0-9ubuntu1.3) ...
==> master: Downloading/unpacking ansible
==> master:   Running setup.py (path:/tmp/pip_build_root/ansible/setup.py) egg_info for package ansible
==> master:
==> master:     no previously-included directories found matching 'v2'
==> master:     no previously-included directories found matching 'docsite'
==> master:     no previously-included directories found matching 'ticket_stubs'
==> master:     no previously-included directories found matching 'packaging'
==> master:     no previously-included directories found matching 'test'
==> master:     no previously-included directories found matching 'hacking'
==> master:     no previously-included directories found matching 'lib/ansible/modules/core/.git'
==> master:     no previously-included directories found matching 'lib/ansible/modules/extras/.git'
==> master: Downloading/unpacking paramiko (from ansible)
==> master: Downloading/unpacking jinja2 (from ansible)
==> master: Requirement already satisfied (use --upgrade to upgrade): PyYAML in /usr/lib/python2.7/dist-packages (from
ansible)
==> master: Requirement already satisfied (use --upgrade to upgrade): setuptools in /usr/lib/python2.7/dist-packages (from ansible)
==> master: Requirement already satisfied (use --upgrade to upgrade): pycrypto>=2.6 in /usr/lib/python2.7/dist-packages (from ansible)
==> master: Downloading/unpacking ecdsa>=0.11 (from paramiko->ansible)
==> master: Downloading/unpacking MarkupSafe (from jinja2->ansible)
==> master:   Downloading MarkupSafe-0.23.tar.gz
==> master:   Running setup.py (path:/tmp/pip_build_root/MarkupSafe/setup.py) egg_info for package MarkupSafe
==> master:
==> master: Installing collected packages: ansible, paramiko, jinja2, ecdsa, MarkupSafe
==> master:   Running setup.py install for ansible
==> master:     changing mode of build/scripts-2.7/ansible from 644 to 755
==> master:     changing mode of build/scripts-2.7/ansible-playbook from 644 to 755
==> master:     changing mode of build/scripts-2.7/ansible-pull from 644 to 755
==> master:     changing mode of build/scripts-2.7/ansible-doc from 644 to 755
==> master:     changing mode of build/scripts-2.7/ansible-galaxy from 644 to 755
==> master:     changing mode of build/scripts-2.7/ansible-vault from 644 to 755
==> master:
==> master:     no previously-included directories found matching 'v2'
==> master:     no previously-included directories found matching 'docsite'
==> master:     no previously-included directories found matching 'ticket_stubs'
==> master:     no previously-included directories found matching 'test'
==> master:     no previously-included directories found matching 'hacking'
==> master:     no previously-included directories found matching 'lib/ansible/modules/core/.git'
==> master:     no previously-included directories found matching 'lib/ansible/modules/extras/.git'
==> master:     changing mode of /usr/local/bin/ansible-galaxy to 755
==> master:     changing mode of /usr/local/bin/ansible-playbook to 755
==> master:     changing mode of /usr/local/bin/ansible-doc to 755
==> master:     changing mode of /usr/local/bin/ansible-pull to 755
==> master:     changing mode of /usr/local/bin/ansible-vault to 755
==> master:     changing mode of /usr/local/bin/ansible to 755
==> master:   Running setup.py install for MarkupSafe
==> master:
==> master:     building 'markupsafe._speedups' extension
==> master:     x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c markupsafe/_speedups.c -o build/temp.linux-x86_64-2.7/markupsafe/_speedups.o
==> master:     x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/markupsafe/_speedups.o -o build/lib.linux-x86_64-2.7/markupsafe/_speedups.so
==> master: Successfully installed ansible paramiko jinja2 ecdsa MarkupSafe
==> master: Cleaning up...
==> master: # 192.168.51.4 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
==> master: # 192.168.51.4 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
==> master: read (192.168.51.6): No route to host
==> master: read (192.168.51.6): No route to host
==> data1: Importing base box 'ubuntu/trusty64'...
==> data1: Matching MAC address for NAT networking...
==> data1: Checking if box 'ubuntu/trusty64' is up to date...
==> data1: A newer version of the box 'ubuntu/trusty64' is available! You currently
==> data1: have version '20150916.0.0'. The latest is version '20150924.0.0'. Run
==> data1: `vagrant box update` to update.
==> data1: Setting the name of the VM: data1
==> data1: Clearing any previously set forwarded ports...
==> data1: Fixed port collision for 22 => 2222. Now on port 2200.
==> data1: Clearing any previously set network interfaces...
==> data1: Preparing network interfaces based on configuration...
    data1: Adapter 1: nat
    data1: Adapter 2: hostonly
==> data1: Forwarding ports...
    data1: 22 => 2200 (adapter 1)
==> data1: Running 'pre-boot' VM customizations...
==> data1: Booting VM...
==> data1: Waiting for machine to boot. This may take a few minutes...
    data1: SSH address: 127.0.0.1:2200
    data1: SSH username: vagrant
    data1: SSH auth method: private key
    data1: Warning: Connection timeout. Retrying...
==> data1: Machine booted and ready!
==> data1: Checking for guest additions in VM...
==> data1: Setting hostname...
==> data1: Configuring and enabling network interfaces...
==> data1: Mounting shared folders...
    data1: /vagrant => C:/Users/watrous/Documents/hadoop
==> data1: Running provisioner: file...
==> data2: Importing base box 'ubuntu/trusty64'...
==> data2: Matching MAC address for NAT networking...
==> data2: Checking if box 'ubuntu/trusty64' is up to date...
==> data2: A newer version of the box 'ubuntu/trusty64' is available! You currently
==> data2: have version '20150916.0.0'. The latest is version '20150924.0.0'. Run
==> data2: `vagrant box update` to update.
==> data2: Setting the name of the VM: data2
==> data2: Clearing any previously set forwarded ports...
==> data2: Fixed port collision for 22 => 2222. Now on port 2201.
==> data2: Clearing any previously set network interfaces...
==> data2: Preparing network interfaces based on configuration...
    data2: Adapter 1: nat
    data2: Adapter 2: hostonly
==> data2: Forwarding ports...
    data2: 22 => 2201 (adapter 1)
==> data2: Running 'pre-boot' VM customizations...
==> data2: Booting VM...
==> data2: Waiting for machine to boot. This may take a few minutes...
    data2: SSH address: 127.0.0.1:2201
    data2: SSH username: vagrant
    data2: SSH auth method: private key
    data2: Warning: Connection timeout. Retrying...
==> data2: Machine booted and ready!
==> data2: Checking for guest additions in VM...
==> data2: Setting hostname...
==> data2: Configuring and enabling network interfaces...
==> data2: Mounting shared folders...
    data2: /vagrant => C:/Users/watrous/Documents/hadoop
==> data2: Running provisioner: file...

Shown in the output above is the bootstrap-master.sh script installing ansible and other required libraries. At this point all three servers are ready for Hadoop to be installed and your VirtualBox console would look something like this:

virtualbox-hadoop-hosts

Limit to a single datanode

If you are low on RAM, you can make a couple of small changes to install only two servers with the same effect. To do this change the following files.

  • Vagrantfile: Remove or comment the definition of the unwanted datanode
  • group_vars/all: Remove or comment the unused host
  • hosts-dev: Remove or comment the unused host

Conversely it is possible to add as many datanodes as you like by modifying the same files above. Those changes will trickle through to as many hosts as you define. I’ll discuss that more in a future post when we use this same Ansible scripts to deploy to a cloud provider.

Install Hadoop

It’s now time to install Hadoop. There are several commented lines in the bootstrap-master.sh script that you can copy and paste to perform the next few steps. The easiest is to login to the hadoop-master server and run the ansible playbook.

Proxy management

If you happen to be behind a proxy then you’ll need to make sure that you update the proxy settings in bootstrap-master.sh and group_vars/all. For the group_vars, if you don’t have a proxy, just leave the none: false setting in place, otherwise the ansible playbook will fail since it’s expecting that to be a dictionary.

Run the Ansible playbook

Below you can see the Ansible output from configuring and installing Hadoop and all its dependencies on all three servers in your new cluster.

vagrant@hadoop-master:~$ cd src/
vagrant@hadoop-master:~/src$ ansible-playbook -i hosts-dev playbook.yml
 
PLAY [Install hadoop master node] *********************************************
 
GATHERING FACTS ***************************************************************
ok: [192.168.51.4]
 
TASK: [common | group name=hadoop state=present] ******************************
changed: [192.168.51.4]
 
TASK: [common | user name=hadoop comment="Hadoop" group=hadoop shell=/bin/bash] ***
changed: [192.168.51.4]
 
TASK: [common | authorized_key user=hadoop key="ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDWeJfgWx7hDeZUJOeaIVzcbmYxzMcWfxhgC2975tvGL5BV6unzLz8ZVak6ju++AvnM5mcQp6Ydv73uWyaoQaFZigAzfuenruQkwc7D5YYuba+FgZdQ8VHon29oQA3iaZWG7xTspagrfq3fcqaz2ZIjzqN+E/MtcW08PwfibN2QRWchBCuZ1Q8AmrW7gClzMcgd/uj3TstabspGaaZMCs8aC9JWzZlMMegXKYHvVQs6xH2AmifpKpLoMTdO8jP4jczmGebPzvaXmvVylgwo6bRJ3tyYAmGwx8PHj2EVVQ0XX9ipgixLyAa2c7+/crPpGmKFRrYibCCT6x65px7nWnn3"] ***
changed: [192.168.51.4]
 
TASK: [common | unpack hadoop] ************************************************
changed: [192.168.51.4]
 
TASK: [common | command mv /usr/local/hadoop-2.7.1 /usr/local/hadoop creates=/usr/local/hadoop removes=/usr/local/hadoop-2.7.1] ***
changed: [192.168.51.4]
 
TASK: [common | lineinfile dest=/home/hadoop/.bashrc regexp="HADOOP_HOME=" line="export HADOOP_HOME=/usr/local/hadoop"] ***
changed: [192.168.51.4]
 
TASK: [common | lineinfile dest=/home/hadoop/.bashrc regexp="PATH=" line="export PATH=$PATH:$HADOOP_HOME/bin"] ***
changed: [192.168.51.4]
 
TASK: [common | lineinfile dest=/home/hadoop/.bashrc regexp="HADOOP_SSH_OPTS=" line="export HADOOP_SSH_OPTS=\"-i /home/hadoop/.ssh/hadoop_rsa\""] ***
changed: [192.168.51.4]
 
TASK: [common | Build hosts file] *********************************************
changed: [192.168.51.4] => (item={'ip': '192.168.51.4', 'hostname': 'hadoop-master'})
changed: [192.168.51.4] => (item={'ip': '192.168.51.5', 'hostname': 'hadoop-data1'})
changed: [192.168.51.4] => (item={'ip': '192.168.51.6', 'hostname': 'hadoop-data2'})
 
TASK: [common | lineinfile dest=/etc/hosts regexp='127.0.1.1' state=absent] ***
changed: [192.168.51.4]
 
TASK: [common | file path=/home/hadoop/tmp state=directory owner=hadoop group=hadoop mode=750] ***
changed: [192.168.51.4]
 
TASK: [common | file path=/home/hadoop/hadoop-data/hdfs/namenode state=directory owner=hadoop group=hadoop mode=750] ***
changed: [192.168.51.4]
 
TASK: [common | file path=/home/hadoop/hadoop-data/hdfs/datanode state=directory owner=hadoop group=hadoop mode=750] ***
changed: [192.168.51.4]
 
TASK: [common | Add the service scripts] **************************************
changed: [192.168.51.4] => (item={'dest': '/usr/local/hadoop/etc/hadoop/core-site.xml', 'src': 'core-site.xml'})
changed: [192.168.51.4] => (item={'dest': '/usr/local/hadoop/etc/hadoop/hdfs-site.xml', 'src': 'hdfs-site.xml'})
changed: [192.168.51.4] => (item={'dest': '/usr/local/hadoop/etc/hadoop/yarn-site.xml', 'src': 'yarn-site.xml'})
changed: [192.168.51.4] => (item={'dest': '/usr/local/hadoop/etc/hadoop/mapred-site.xml', 'src': 'mapred-site.xml'})
 
TASK: [common | lineinfile dest=/usr/local/hadoop/etc/hadoop/hadoop-env.sh regexp="^export JAVA_HOME" line="export JAVA_HOME=/usr/lib/jvm/java-8-oracle"] ***
changed: [192.168.51.4]
 
TASK: [common | ensure hostkeys is a known host] ******************************
# hadoop-master SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.4] => (item={'ip': '192.168.51.4', 'hostname': 'hadoop-master'})
# hadoop-data1 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.4] => (item={'ip': '192.168.51.5', 'hostname': 'hadoop-data1'})
# hadoop-data2 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.4] => (item={'ip': '192.168.51.6', 'hostname': 'hadoop-data2'})
 
TASK: [oraclejava8 | apt_repository repo='deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main' state=present] ***
changed: [192.168.51.4]
 
TASK: [oraclejava8 | apt_repository repo='deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main' state=present] ***
changed: [192.168.51.4]
 
TASK: [oraclejava8 | debconf name='oracle-java8-installer' question='shared/accepted-oracle-license-v1-1' value='true' vtype='select' unseen=false] ***
changed: [192.168.51.4]
 
TASK: [oraclejava8 | apt_key keyserver=keyserver.ubuntu.com id=EEA14886] ******
changed: [192.168.51.4]
 
TASK: [oraclejava8 | Install Java] ********************************************
changed: [192.168.51.4]
 
TASK: [oraclejava8 | lineinfile dest=/home/hadoop/.bashrc regexp="^export JAVA_HOME" line="export JAVA_HOME=/usr/lib/jvm/java-8-oracle"] ***
changed: [192.168.51.4]
 
TASK: [master | Copy private key into place] **********************************
changed: [192.168.51.4]
 
TASK: [master | Copy slaves into place] ***************************************
changed: [192.168.51.4]
 
TASK: [master | prepare known_hosts] ******************************************
# 192.168.51.4 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.4] => (item={'ip': '192.168.51.4', 'hostname': 'hadoop-master'})
# 192.168.51.5 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.4] => (item={'ip': '192.168.51.5', 'hostname': 'hadoop-data1'})
# 192.168.51.6 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.4] => (item={'ip': '192.168.51.6', 'hostname': 'hadoop-data2'})
 
TASK: [master | add 0.0.0.0 to known_hosts for secondary namenode] ************
# 0.0.0.0 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.4]
 
PLAY [Install hadoop data nodes] **********************************************
 
GATHERING FACTS ***************************************************************
ok: [192.168.51.5]
ok: [192.168.51.6]
 
TASK: [common | group name=hadoop state=present] ******************************
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [common | user name=hadoop comment="Hadoop" group=hadoop shell=/bin/bash] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [common | authorized_key user=hadoop key="ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDWeJfgWx7hDeZUJOeaIVzcbmYxzMcWfxhgC2975tvGL5BV6unzLz8ZVak6ju++AvnM5mcQp6Ydv73uWyaoQaFZigAzfuenruQkwc7D5YYuba+FgZdQ8VHon29oQA3iaZWG7xTspagrfq3fcqaz2ZIjzqN+E/MtcW08PwfibN2QRWchBCuZ1Q8AmrW7gClzMcgd/uj3TstabspGaaZMCs8aC9JWzZlMMegXKYHvVQs6xH2AmifpKpLoMTdO8jP4jczmGebPzvaXmvVylgwo6bRJ3tyYAmGwx8PHj2EVVQ0XX9ipgixLyAa2c7+/crPpGmKFRrYibCCT6x65px7nWnn3"] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [common | unpack hadoop] ************************************************
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [common | command mv /usr/local/hadoop-2.7.1 /usr/local/hadoop creates=/usr/local/hadoop removes=/usr/local/hadoop-2.7.1] ***
changed: [192.168.51.6]
changed: [192.168.51.5]
 
TASK: [common | lineinfile dest=/home/hadoop/.bashrc regexp="HADOOP_HOME=" line="export HADOOP_HOME=/usr/local/hadoop"] ***
changed: [192.168.51.6]
changed: [192.168.51.5]
 
TASK: [common | lineinfile dest=/home/hadoop/.bashrc regexp="PATH=" line="export PATH=$PATH:$HADOOP_HOME/bin"] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [common | lineinfile dest=/home/hadoop/.bashrc regexp="HADOOP_SSH_OPTS=" line="export HADOOP_SSH_OPTS=\"-i /home/hadoop/.ssh/hadoop_rsa\""] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [common | Build hosts file] *********************************************
changed: [192.168.51.5] => (item={'ip': '192.168.51.4', 'hostname': 'hadoop-master'})
changed: [192.168.51.6] => (item={'ip': '192.168.51.4', 'hostname': 'hadoop-master'})
changed: [192.168.51.5] => (item={'ip': '192.168.51.5', 'hostname': 'hadoop-data1'})
changed: [192.168.51.6] => (item={'ip': '192.168.51.5', 'hostname': 'hadoop-data1'})
changed: [192.168.51.5] => (item={'ip': '192.168.51.6', 'hostname': 'hadoop-data2'})
changed: [192.168.51.6] => (item={'ip': '192.168.51.6', 'hostname': 'hadoop-data2'})
 
TASK: [common | lineinfile dest=/etc/hosts regexp='127.0.1.1' state=absent] ***
changed: [192.168.51.6]
changed: [192.168.51.5]
 
TASK: [common | file path=/home/hadoop/tmp state=directory owner=hadoop group=hadoop mode=750] ***
changed: [192.168.51.6]
changed: [192.168.51.5]
 
TASK: [common | file path=/home/hadoop/hadoop-data/hdfs/namenode state=directory owner=hadoop group=hadoop mode=750] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [common | file path=/home/hadoop/hadoop-data/hdfs/datanode state=directory owner=hadoop group=hadoop mode=750] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [common | Add the service scripts] **************************************
changed: [192.168.51.5] => (item={'dest': '/usr/local/hadoop/etc/hadoop/core-site.xml', 'src': 'core-site.xml'})
changed: [192.168.51.6] => (item={'dest': '/usr/local/hadoop/etc/hadoop/core-site.xml', 'src': 'core-site.xml'})
changed: [192.168.51.5] => (item={'dest': '/usr/local/hadoop/etc/hadoop/hdfs-site.xml', 'src': 'hdfs-site.xml'})
changed: [192.168.51.6] => (item={'dest': '/usr/local/hadoop/etc/hadoop/hdfs-site.xml', 'src': 'hdfs-site.xml'})
changed: [192.168.51.6] => (item={'dest': '/usr/local/hadoop/etc/hadoop/yarn-site.xml', 'src': 'yarn-site.xml'})
changed: [192.168.51.5] => (item={'dest': '/usr/local/hadoop/etc/hadoop/yarn-site.xml', 'src': 'yarn-site.xml'})
changed: [192.168.51.6] => (item={'dest': '/usr/local/hadoop/etc/hadoop/mapred-site.xml', 'src': 'mapred-site.xml'})
changed: [192.168.51.5] => (item={'dest': '/usr/local/hadoop/etc/hadoop/mapred-site.xml', 'src': 'mapred-site.xml'})
 
TASK: [common | lineinfile dest=/usr/local/hadoop/etc/hadoop/hadoop-env.sh regexp="^export JAVA_HOME" line="export JAVA_HOME=/usr/lib/jvm/java-8-oracle"] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [common | ensure hostkeys is a known host] ******************************
# hadoop-master SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
# hadoop-master SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.5] => (item={'ip': '192.168.51.4', 'hostname': 'hadoop-master'})
changed: [192.168.51.6] => (item={'ip': '192.168.51.4', 'hostname': 'hadoop-master'})
# hadoop-data1 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
# hadoop-data1 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.5] => (item={'ip': '192.168.51.5', 'hostname': 'hadoop-data1'})
# hadoop-data2 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.6] => (item={'ip': '192.168.51.5', 'hostname': 'hadoop-data1'})
# hadoop-data2 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.5] => (item={'ip': '192.168.51.6', 'hostname': 'hadoop-data2'})
changed: [192.168.51.6] => (item={'ip': '192.168.51.6', 'hostname': 'hadoop-data2'})
 
TASK: [oraclejava8 | apt_repository repo='deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main' state=present] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [oraclejava8 | apt_repository repo='deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main' state=present] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [oraclejava8 | debconf name='oracle-java8-installer' question='shared/accepted-oracle-license-v1-1' value='true' vtype='select' unseen=false] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [oraclejava8 | apt_key keyserver=keyserver.ubuntu.com id=EEA14886] ******
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [oraclejava8 | Install Java] ********************************************
changed: [192.168.51.6]
changed: [192.168.51.5]
 
TASK: [oraclejava8 | lineinfile dest=/home/hadoop/.bashrc regexp="^export JAVA_HOME" line="export JAVA_HOME=/usr/lib/jvm/java-8-oracle"] ***
changed: [192.168.51.6]
changed: [192.168.51.5]
 
PLAY RECAP ********************************************************************
192.168.51.4               : ok=27   changed=26   unreachable=0    failed=0
192.168.51.5               : ok=23   changed=22   unreachable=0    failed=0
192.168.51.6               : ok=23   changed=22   unreachable=0    failed=0

Start Hadoop and run a job

Now that you have Hadoop installed, it’s time to format HDFS and start up all the services. All the commands to do this are available as comments in the bootstrap-master.sh file. The first step is to format the hdfs namenode. All of the commands that follow are executed as the hadoop user.

vagrant@hadoop-master:~/src$ sudo su - hadoop
hadoop@hadoop-master:~$ hdfs namenode -format
15/09/30 16:06:36 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadoop-master/192.168.51.4
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.1
STARTUP_MSG:   classpath = [truncated]
STARTUP_MSG:   build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a; compiled by 'jenkins' on 2015-06-29T06:04Z
STARTUP_MSG:   java = 1.8.0_60
************************************************************/
15/09/30 16:06:36 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/09/30 16:06:36 INFO namenode.NameNode: createNameNode [-format]
15/09/30 16:06:36 WARN common.Util: Path /home/hadoop/hadoop-data/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
15/09/30 16:06:36 WARN common.Util: Path /home/hadoop/hadoop-data/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
Formatting using clusterid: CID-1c37e2f0-ba4b-4ad7-84d7-223dec53d34a
15/09/30 16:06:36 INFO namenode.FSNamesystem: No KeyProvider found.
15/09/30 16:06:36 INFO namenode.FSNamesystem: fsLock is fair:true
15/09/30 16:06:36 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
15/09/30 16:06:36 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
15/09/30 16:06:36 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
15/09/30 16:06:36 INFO blockmanagement.BlockManager: The block deletion will start around 2015 Sep 30 16:06:36
15/09/30 16:06:36 INFO util.GSet: Computing capacity for map BlocksMap
15/09/30 16:06:36 INFO util.GSet: VM type       = 64-bit
15/09/30 16:06:36 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
15/09/30 16:06:36 INFO util.GSet: capacity      = 2^21 = 2097152 entries
15/09/30 16:06:36 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
15/09/30 16:06:36 INFO blockmanagement.BlockManager: defaultReplication         = 2
15/09/30 16:06:36 INFO blockmanagement.BlockManager: maxReplication             = 512
15/09/30 16:06:36 INFO blockmanagement.BlockManager: minReplication             = 1
15/09/30 16:06:36 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
15/09/30 16:06:36 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
15/09/30 16:06:36 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
15/09/30 16:06:36 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
15/09/30 16:06:36 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
15/09/30 16:06:36 INFO namenode.FSNamesystem: fsOwner             = hadoop (auth:SIMPLE)
15/09/30 16:06:36 INFO namenode.FSNamesystem: supergroup          = supergroup
15/09/30 16:06:36 INFO namenode.FSNamesystem: isPermissionEnabled = true
15/09/30 16:06:36 INFO namenode.FSNamesystem: HA Enabled: false
15/09/30 16:06:36 INFO namenode.FSNamesystem: Append Enabled: true
15/09/30 16:06:37 INFO util.GSet: Computing capacity for map INodeMap
15/09/30 16:06:37 INFO util.GSet: VM type       = 64-bit
15/09/30 16:06:37 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
15/09/30 16:06:37 INFO util.GSet: capacity      = 2^20 = 1048576 entries
15/09/30 16:06:37 INFO namenode.FSDirectory: ACLs enabled? false
15/09/30 16:06:37 INFO namenode.FSDirectory: XAttrs enabled? true
15/09/30 16:06:37 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
15/09/30 16:06:37 INFO namenode.NameNode: Caching file names occuring more than 10 times
15/09/30 16:06:37 INFO util.GSet: Computing capacity for map cachedBlocks
15/09/30 16:06:37 INFO util.GSet: VM type       = 64-bit
15/09/30 16:06:37 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
15/09/30 16:06:37 INFO util.GSet: capacity      = 2^18 = 262144 entries
15/09/30 16:06:37 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
15/09/30 16:06:37 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
15/09/30 16:06:37 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
15/09/30 16:06:37 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
15/09/30 16:06:37 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
15/09/30 16:06:37 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
15/09/30 16:06:37 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
15/09/30 16:06:37 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
15/09/30 16:06:37 INFO util.GSet: Computing capacity for map NameNodeRetryCache
15/09/30 16:06:37 INFO util.GSet: VM type       = 64-bit
15/09/30 16:06:37 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
15/09/30 16:06:37 INFO util.GSet: capacity      = 2^15 = 32768 entries
15/09/30 16:06:37 INFO namenode.FSImage: Allocated new BlockPoolId: BP-992546781-192.168.51.4-1443629197156
15/09/30 16:06:37 INFO common.Storage: Storage directory /home/hadoop/hadoop-data/hdfs/namenode has been successfully formatted.
15/09/30 16:06:37 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/09/30 16:06:37 INFO util.ExitUtil: Exiting with status 0
15/09/30 16:06:37 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop-master/192.168.51.4
************************************************************/

Start DFS

Next start the dfs services, as shown.

hadoop@hadoop-master:~$ /usr/local/hadoop/sbin/start-dfs.sh
Starting namenodes on [hadoop-master]
hadoop-master: Warning: Permanently added the RSA host key for IP address '192.168.51.4' to the list of known hosts.
hadoop-master: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-hadoop-master.out
hadoop-data2: Warning: Permanently added the RSA host key for IP address '192.168.51.6' to the list of known hosts.
hadoop-data1: Warning: Permanently added the RSA host key for IP address '192.168.51.5' to the list of known hosts.
hadoop-master: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-hadoop-master.out
hadoop-data2: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-hadoop-data2.out
hadoop-data1: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-hadoop-data1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-hadoop-master.out

At this point you can access the HDFS status and see all three datanodes attached wtih this URL: http://192.168.51.4:50070/dfshealth.html#tab-datanode.

Start yarn

Next start the yarn service as shown.

hadoop@hadoop-master:~$ /usr/local/hadoop/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-hadoop-master.out
hadoop-data2: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-hadoop-data2.out
hadoop-data1: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-hadoop-data1.out
hadoop-master: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-hadoop-master.out

At this point you can access information about the compute nodes in the cluster and currently running jobs at this URL: http://192.168.51.4:8088/cluster/nodes

Verify that Java processes are running

Hadoop provides a useful script to run a command on all nodes listed in slaves. For example, you can confirm that all expected Java processes are running as expected with the following command.

hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
hadoop-data2: 3872 DataNode
hadoop-data2: 4180 Jps
hadoop-data2: 4021 NodeManager
hadoop-master: 7617 NameNode
hadoop-data1: 3872 DataNode
hadoop-data1: 4180 Jps
hadoop-master: 8675 Jps
hadoop-data1: 4021 NodeManager
hadoop-master: 8309 NodeManager
hadoop-master: 8150 ResourceManager
hadoop-master: 7993 SecondaryNameNode
hadoop-master: 7788 DataNode

Run an example job

Finally, it’s possible to confirm that everything is working as expected by running one of the example jobs. Let’s find the number pi.

hadoop@hadoop-master:~$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 10 30
Number of Maps  = 10
Samples per Map = 30
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
15/09/30 19:54:28 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/192.168.51.4:8032
15/09/30 19:54:29 INFO input.FileInputFormat: Total input paths to process : 10
15/09/30 19:54:29 INFO mapreduce.JobSubmitter: number of splits:10
15/09/30 19:54:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1443642855962_0001
15/09/30 19:54:29 INFO impl.YarnClientImpl: Submitted application application_1443642855962_0001
15/09/30 19:54:29 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1443642855962_0001/
15/09/30 19:54:29 INFO mapreduce.Job: Running job: job_1443642855962_0001
15/09/30 19:54:38 INFO mapreduce.Job: Job job_1443642855962_0001 running in uber mode : false
15/09/30 19:54:38 INFO mapreduce.Job:  map 0% reduce 0%
15/09/30 19:54:52 INFO mapreduce.Job:  map 40% reduce 0%
15/09/30 19:54:56 INFO mapreduce.Job:  map 100% reduce 0%
15/09/30 19:54:59 INFO mapreduce.Job:  map 100% reduce 100%
15/09/30 19:54:59 INFO mapreduce.Job: Job job_1443642855962_0001 completed successfully
15/09/30 19:54:59 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=226
                FILE: Number of bytes written=1272744
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=2710
                HDFS: Number of bytes written=215
                HDFS: Number of read operations=43
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=3
        Job Counters
                Launched map tasks=10
                Launched reduce tasks=1
                Data-local map tasks=10
                Total time spent by all maps in occupied slots (ms)=140318
                Total time spent by all reduces in occupied slots (ms)=4742
                Total time spent by all map tasks (ms)=140318
                Total time spent by all reduce tasks (ms)=4742
                Total vcore-seconds taken by all map tasks=140318
                Total vcore-seconds taken by all reduce tasks=4742
                Total megabyte-seconds taken by all map tasks=143685632
                Total megabyte-seconds taken by all reduce tasks=4855808
        Map-Reduce Framework
                Map input records=10
                Map output records=20
                Map output bytes=180
                Map output materialized bytes=280
                Input split bytes=1530
                Combine input records=0
                Combine output records=0
                Reduce input groups=2
                Reduce shuffle bytes=280
                Reduce input records=20
                Reduce output records=0
                Spilled Records=40
                Shuffled Maps =10
                Failed Shuffles=0
                Merged Map outputs=10
                GC time elapsed (ms)=3509
                CPU time spent (ms)=5620
                Physical memory (bytes) snapshot=2688745472
                Virtual memory (bytes) snapshot=20847497216
                Total committed heap usage (bytes)=2040528896
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=1180
        File Output Format Counters
                Bytes Written=97
Job Finished in 31.245 seconds
Estimated value of Pi is 3.16000000000000000000

Security and Configuration

This example is not production hardened. It does nothing to address firewall management. The key management is permissive and intended to make it easy to communicate between nodes. If this is to be used for a production deployment, it should be easy to add a role to setup the firewall. You may also want be more cautious about accepting keys between hosts.

Default Ports

Lots of people ask about what the default ports are for Hadoop services. The following four links provide all the properties that can be set for any of the main components, including the defaults if they are absent from the configuration file. If it isn’t overridden in the Ansible playbook role templates in the git repository, then the property is the default as shown in the links below.

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml

Problems spanning subnets

While developing this automation, I originally had the datanodes running on a separate subnet. There’s a problem/bug with Hadoop that prevented nodes from communicating across subnets. The following thread covers some of the discussion.

http://mail-archives.apache.org/mod_mbox/hadoop-user/201509.mbox/%3CCAKFXasEROCe%2BfL%2B8T7A3L0j4Qrm%3D4HHuzGfJhNuZ5MqUvQ%3DwjA%40mail.gmail.com%3E

Resources

While developing my Ansible scripts I leaned heavily on this tutorial
https://chawlasumit.wordpress.com/2015/03/09/install-a-multi-node-hadoop-cluster-on-ubuntu-14-04/

Software Engineering

The Road to PaaS

I have observed that discussions about CloudFoundry often lack accurate context. Some questions I get that indicate context is missing include:

  • What Java version does CloudFoundry support?
  • What database products/versions are available
  • How can I access the server directly?

There are a few reasons that the questions above are not relevant for CloudFoundry (or any modern PaaS environment). To understand why, it’s important to understand how we got to PaaS and where we came from.

cloudfoundry-compared-traditional

Landscape

When computers were first becoming a common requirement for the enterprise, most applications were monolithic. All applicaiton components would run on the same general purpose server. This included interface, application technology (e.g. Java, .NET and PHP) and data and file storage. Over time, these functions were distributed across different servers. The servers also began to take on characteristic differences that would accommodate the technology being run.

Today, compute has been commoditized and virtualized. Rather than thinking of compute as a physical server, built to suit a specific purpose, compute is instead viewed in discreet chunks that can be scaled horizontally. PaaS today marries an application with those chunks of compute capacity as needed and abstracts application access to services, which may or may not run on the same PaaS platform.

Contributor and Organization Dynamic

The role of contributors and organizations have changed throughout the evolution of the landscape. Early monolithic systems required technology experts who were familiar with a broad range of technologies, including system administration, programming, networking, etc. As the functions were distributed, the roles became more defined by their specializations. Webmasters, DBAs, and programmers became siloed. Some unintended conflicts complicated this more distributed architecture due in part to the fact that efficiencies in one silo did not always align with the best interests of other silos.

DevOps

As the evolution pushed toward compute as a commodity, the new found flexibility drove many frustrated technologists to reach beyond their respective silo to accomplish their design and delivery objectives. Programmers began to look at how different operating system environments and database technologies could enable them to produce results faster and more reliably. System administrators began to rethink system management in ways that abstracted hardware dependencies and decreased the complexity involved in augmenting compute capacity available to individual functions. Datastore, network, storage and other experts began a similar process of abstracting their offering. This blending of roles and new dynamic of collaboration and contribution has come to be known as DevOps.

Interoperability

Interoperability between systems and applications in the days of monolithic application development made use of many protocols. This was due in part to the fact that each monolithic system exposed it’s services in different ways. As the above progression took place, the field of available protocols normalized. RESTful interfaces over HTTP have emerged as an accepted standard and the serialization structures most common to REST are XML and JSON. This makes integration straight forward and provides for a high amount of reuse of existing services. This also makes services available to a greater diversity of devices.

Security and Isolation

One key development that made this evolution from compute as hardware to compute as a utility possible was effective isolation of compute resources on shared hardware. The first big step in this direction came in the form of virualization. Virtualized hardware made it possible to run many distinct operating systems simultaneously on the same hardware. It also significantly reduced the time to provision new server resources, since the underlying hardware was already wired and ready.

Compute as a ________

The next step in the evolution came in the form of containers. Unlike virtualization, containers made it possible to provide an isolated, configurable compute instance in much less time that consumed fewer system resources to create and manage (i.e. lightweight). This progression from compute as hardware to compute as virtual and finally to compute as a container made it realistic to literally view compute as discreet chunks that could be created and destroyed in seconds as capacity requirements changed.

Infrastructure as Code

Another important observation regarding the evolution of compute is that as the compute environment became easier to create (time to provision decreased), the process to provision changed. When a physical server required ordering, shipping, mounting, wiring, etc., it was reasonable to take a day or two to install and configure the operating system, network and related components. When that hardware was virtualized and could be provisioned in hours (or less), system administrators began to pursue more automation to accommodate the setup of these systems (e.g. ansible, puppet, chef and even Vagrant). This made it possible to think of systems as more transient. With the advent of Linux containers, the idea of infrastructure as code became even more prevalent. Time to provision is approaching zero.

A related byproduct of infrastructure defined by scripts or code was reproduceability. Whereas it was historically difficult to ensure that two systems were configured identically, the method for provisioning containers made it trivial to ensure that compute resources were identically configured. This in turn improved debugging, collaboration and accommodated versioning of operating environments.

Contextual Answers

Given that the landscape has changed so drastically, let’s look at some possible answers to the questions from the beginning of this post.

  • Q. What Java (or any language) version does CloudFoundry support?
    A. It supports any language that is defined in the scripts used to provision the container that will run the application. While it is true that some such scripts may be available by default, this doesn’t imply that the PaaS provides only that. If it’s a fit, use it. If not, create new provisioning scripts.
  • Q. What database products/versions are available?
    A. Any database product or version can be used. If the datastore services available that are associated with the PaaS by default are not sufficient, bring your own or create another application component to accommodate your needs.
  • Q. How can I access the server directly?
    A. There is no “the server” If you want to know more about the server environment, look at the script/code that is responsible for provisioning it. Even better, create a new container and play around with it. Once you get things just right, update your code so that every new container incorporates the desired changes. Every “the server” will look exactly how you define it.
Software Engineering

Explore CloudFoundry using bosh-lite on Windows

It seems like most of the development around CloudFoundry and bosh happen on Linux or Mac. Getting things up and running in Windows was a real challenge. Below is how I worked things out.

**Make sure you have a modern processor that supports all virtualization technologies, such as VTx and extended paging.

Aside from the deviations mentioned below, I’m following the steps documented at https://github.com/cloudfoundry/bosh-lite

Changes to Vagrantfile

I’m using VirtualBox on Windows 7. To begin with, I modified the Vagrantfile to create two VMs rather than a single VM. The first is the VM that will run CloudFoundry. The second is to run bosh for the deployment of CloudFoundry. I use a second Linux VM to execute the bosh deployment since all the commands and files were developed in a *nix environment.

I am also more explicit in my network setup. I want the two hosts to have free communication on a local private network. I leave the default IP address assignment for the CloudFoundry host. For the bosh host I change the last octet of the IP address to 14.

  config.vm.define "cf" do |cf|
    cf.vm.provider :virtualbox do |v, override|
      override.vm.box = 'cloudfoundry/bosh-lite'
      override.vm.box_version = '388'
 
      # To use a different IP address for the bosh-lite director, uncomment this line:
      override.vm.network :private_network, ip: '192.168.50.4', id: :local
      override.vm.network :public_network
    end
  end
 
  config.vm.define "boshlite" do |boshlite|
    boshlite.vm.provider :virtualbox do |v, override|
      override.vm.box = 'ubuntu/trusty64'
 
      # To use a different IP address for the bosh-lite director, uncomment this line:
      override.vm.network :private_network, ip: '192.168.50.14', id: :local
      override.vm.network :public_network
      v.memory = 6144
      v.cpus = 2
    end
  end

At this point you can spin up the two hosts.

vagrant up --provider=virtualbox

The remaining steps need to happen on your bosh deployment host (192.168.0.14 based on the Vagrantfile above). In case you need it, here is a refresher on setting up Vagrant SSH connectivity using PuTTY on Windows.

Prepare for provision_cf

If you are in a proxied environment, you’ll need to set the environment variables, including no_proxy for the CloudFoundry host. I include xip.io for ease of access in future steps.

export http_proxy=http://proxy.domain.com:8080
export https_proxy=https://proxy.domain.com:8080
export no_proxy=192.168.50.4,xip.io

Next we need to get prerequisites going and then install the bosh CLI. You may have some of these already, and you may need some additional libraries. This is based on a clean Ubuntu trusty 64 box.

sudo -E add-apt-repository multiverse
sudo -E apt-get update
sudo -E apt-get -y install build-essential linux-headers-`uname -r`
sudo -E apt-get -y install ruby ruby-dev git zip

Now bosh_cli can be installed. I’ve added flags to skip ‘ri’ and ‘rdoc’ since they take a long time. If you really want those, you can drop those arguments.

sudo -E gem install bosh_cli --no-ri --no-rdoc

We also need spiff on this system. Here I grab and unzip the latest spiff, then move the binary into /usr/local/bin.

wget https://github.com/cloudfoundry-incubator/spiff/releases/download/v1.0.3/spiff_linux_amd64.zip
unzip spiff_linux_amd64.zip
sudo mv spiff /usr/local/bin/

Next we need to clone both bosh-lite and cf-release. Even though the contents of bosh-lite are available in “/vagrant”, we need these two directories side by side, so it’s easiest to just clone them both into the home directory of the bosh deployment host. We then change into the bosh-lite directory.

git clone https://github.com/cloudfoundry/bosh-lite.git
git clone https://github.com/cloudfoundry/cf-release
cd bosh-lite/

The script ./bin/provision_cf needs to be edited so that get_ip_from_vagrant_ssh_config simply outputs the private network IP address that was assigned in the Vagrant file. The default functionality assumes that the provision script is run from the the host running Vagrant and VirtualBox. However, these commands are running on the bosh deployment host, which doesn’t know anything about vagrant or virtualbox. Here’s what the function should look like.

get_ip_from_vagrant_ssh_config() {
  echo 192.168.50.4
}

Target bosh and provision

Everything is set to target the bosh host, set the route and provision CloudFoundry. When you first target the cloudfoundry host, it will as for credentials to login.

vagrant@vagrant-ubuntu-trusty-64:~/bosh-lite$ bosh target 192.168.50.4 lite
Target set to `Bosh Lite Director'
Your username: admin
Enter password: *****
Logged in as `admin'

Next we can add the route to the bosh deployment host.

vagrant@vagrant-ubuntu-trusty-64:~/bosh-lite$ ./bin/add-route
Adding the following route entry to your local route table to enable direct warden container access. Your sudo password may be required.
  - net 10.244.0.0/19 via 192.168.50.4

Provision CloudFoundry

The only thing left to do is provision CloudFoundry.

./bin/provision_cf
...
Started         2014-09-29 18:54:39 UTC
Finished        2014-09-29 19:36:11 UTC
Duration        00:41:32
 
Deployed `cf-manifest.yml' to `Bosh Lite Director'

This takes quite a while (possibly hours depending on your hardware). If you have an older processor that doesn’t support all the modern virtualization technologies, this could take much longer.

Verify your new CloudFoundry deployment

In order to use CloudFoundry we need the ‘cf’ client. The cf client is available as a binary download from the main GitHub page for CloudFoundry. The following commands will prepare the cf CLI for use.

wget http://go-cli.s3-website-us-east-1.amazonaws.com/releases/v6.6.1/cf-linux-amd64.tgz
tar xzvf cf-linux-amd64.tgz
sudo mv cf /usr/local/bin/

With the cf CLI installed, it is now possible connect to the API and setup org and space details.

cf api --skip-ssl-validation https://api.10.244.0.34.xip.io
cf auth admin admin
cf create-org myorg
cf target -o myorg
cf create-space mydept
cf target -o myorg -s mydept

You should now have an environment that matches the below.

API endpoint:   https://api.10.244.0.34.xip.io (API version: 2.14.0)
User:           admin
Org:            myorg
Space:          mydept

Deploy an app

You can now deploy an application. To verify, create a directory can add a file:

index.php

<?php phpinfo(); ?>

Now push that app as follows:

vagrant@vagrant-ubuntu-trusty-64:~/test-php$ cf push test-php
Creating app test-php in org myorg / space mydept as admin...
OK
 
Creating route test-php.10.244.0.34.xip.io...
OK
 
Binding test-php.10.244.0.34.xip.io to test-php...
OK
 
Uploading test-php...
Uploading app files from: /home/vagrant/test-php
Uploading 152, 1 files
OK
 
Starting app test-php in org myorg / space mydept as admin...
OK
-----> Downloaded app package (4.0K)
Use locally cached dependencies where possible
 !     WARNING:        No composer.json found.
       Using index.php to declare PHP applications is considered legacy
       functionality and may lead to unexpected behavior.
       See https://devcenter.heroku.com/categories/php
-----> Setting up runtime environment...
       - PHP 5.5.12
       - Apache 2.4.9
       - Nginx 1.4.6
-----> Installing PHP extensions:
       - opcache (automatic; bundled, using 'ext-opcache.ini')
-----> Installing dependencies...
       Composer version ac497feabaa0d247c441178b7b4aaa4c61b07399 2014-06-10 14:13:12
       Warning: This development build of composer is over 30 days old. It is recommended to update it by running "/app/.heroku/php/bin/composer self-update" to get the latest version.
       Loading composer repositories with package information
       Installing dependencies
       Nothing to install or update
       Generating optimized autoload files
-----> Building runtime environment...
       NOTICE: No Procfile, defaulting to 'web: vendor/bin/heroku-php-apache2'
-----> Uploading droplet (64M)
 
0 of 1 instances running, 1 starting
1 of 1 instances running
 
App started
 
Showing health and status for app test-php in org myorg / space mydept as admin...
OK
 
requested state: started
instances: 1/1
usage: 256M x 1 instances
urls: test-php.10.244.0.34.xip.io
 
     state     since                    cpu    memory          disk
#0   running   2014-09-29 07:52:38 PM   0.0%   84.9M of 256M   0 of 1G

It’s now possible to view the app using a browser. From the command line you can access it using this command:

w3m http://test-php.10.244.0.34.xip.io

Observations

In my tests, the xip.io resolution was flaky. I saw intermittent failures with the response:

dial tcp: lookup api.10.244.0.34.xip.io: no such host

In some cases I would have to run the same command a few times before it could resolve the host.

The VMs I setup obtained IP addresses on my network. However, when I tried to access apps or the API over that IP address, the connection is refused. Despite adding the domain (e.g. dhcpip.xip.io) to CloudFoundry and creating routes to my application, all attempts to use the API or load apps over the external IP failed.

Software Engineering

Build a Multi-server LEMP stack using Ansible

My objective in this post is to explore the use of Ansible to configure a multi-server LEMP stack. This builds on the preliminary work I did demonstrating how to use Vagrant to create an environment to run Ansible. You can follow this entire example on any Windows (or Linux) host.

Ansible only runs on Linux hosts, not Windows. As a result, I needed to provision one Linux host to act as Ansible controller. One aspect of Ansible that I wanted to explore is the ability to manage multiple hosts with different configurations. For this experiment, I provision two more Linux hosts, one to act as a database host and the other to function as an Nginx/PHP server for a complete LEMP stack. I created the diagram below to illustrate my setup.

vagrant-ansible-lemp

There are two primary artifact categories for this experiement:

  • Vagrantfile to provision each host
  • Ansible playbook related files

Since there were more than a few Ansible playbook files, I chose to create a github repository rather than provide all the code here. You can clone/fork the files to run this experiment here:

https://github.com/dwatrous/vagrant-ansible-lemp

Explanation

Here is a list of the files you’ll find in that repository.

  • Vagrantfile
  • control.sh
  • lemp/group_vars/all
  • lemp/hosts
  • lemp/roles/common/handlers/main.yml
  • lemp/roles/common/tasks/main.yml
  • lemp/roles/database/handlers/main.yml
  • lemp/roles/database/tasks/main.yml
  • lemp/roles/web/handlers/main.yml
  • lemp/roles/web/tasks/main.yml
  • lemp/roles/web/templates/default
  • lemp/roles/web/templates/wall.php
  • lemp/site.yml

I do use a bootstrap shell script, control.sh, with Vagrant for the Ansible control server. It is necessary to install Ansible on the control server, but since Ansible doesn’t require an agent, there’s no need to bootstrap the other servers.

Playbook files

For each Ansible defined role there are three artifact categories.

  • handlers
  • tasks
  • templates

Handlers are named tasks that can be called or notified when Ansible detects other events. These are commonly used to trigger service restarts when configuration files change, as an example.

Tasks are the meat of the playbook. This lists out the steps to put a system into a desired state, including installing software, copying templates, registering and calling handlers, etc.

Configuration files, such as the nginx ‘default’ configuration in this case, can be stored in the templates folder and copied to the host using a task. Templates are helpful when a desired configuration differs significantly from a system default, this can be easier than updating individual lines in a file one at a time using lineinfile. The Ansible playbook files are in the following directory.

/vagrant/lemp

The site.yml file ties it all together by associating host groups with roles. You run the playbook like this.

ansible-playbook -i hosts site.yml

The example wall.php script should be accessible locally using the port 80->8080 mapping as http://127.0.0.1:8080/wall.php or over port 80 on the external IP assigned to the web host. Here’s what you can expect to see.

ansible-wall-example

Resources

I used the ansible examples repository on Github while putting this together. You may find it useful. For the specifics of installing LEMP on Ubuntu, I followed my Vagrant tutorial.

Software Engineering

Using Vagrant to Explore Ansible

Last week I wrote about Vagrant, a fantastic tool to spin up virtual development environments. Today I’m exploring Ansible. Ansible is an open source tool which streamlines certain system administration activities. Unlike Vagrant, which provisions new machines, Ansible takes an already provisioned machine and configures it. This can include installing and configuring software, managing services, and even running simple commands. Ansible doesn’t require any agent software to be installed on the system being managed. Everything is executed over SSH.

Ansible only runs on Linux (though I’ve heard of people running it in cygwin with some difficulty). In order to play with Ansible, I used Vagrant to spin up a control box and a subject box that are connected in a way that I can easily run Ansible commands. Here’s my Vagrantfile

# -*- mode: ruby -*-
# vi: set ft=ruby :
 
# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"
 
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|

  # define ansible subject (web in this case) box
  config.vm.define "subject" do |subject|
    subject.vm.box = "ubuntu/trusty64"
    subject.vm.network "public_network"
    subject.vm.network "private_network", ip: "192.168.51.4"
    subject.vm.provider "virtualbox" do |v|
      v.name = "Ansible subject"
      v.cpus = 2
      v.memory = 768
    end
    # copy private key so hosts can ssh using key authentication (the script below sets permissions to 600)
    subject.vm.provision :file do |file|
      file.source      = 'C:\Users\watrous\.vagrant.d\insecure_private_key'
      file.destination = '/home/vagrant/.ssh/id_rsa'
    end
    subject.vm.provision :shell, path: "subject.sh"
    subject.vm.network "forwarded_port", guest: 80, host: 8080
  end
 
  # define ansible control box (provision this last so it can add other hosts to known_hosts for ssh authentication)
  config.vm.define "control" do |control|
    control.vm.box = "ubuntu/trusty64"
    control.vm.network "public_network"
    control.vm.network "private_network", ip: "192.168.50.4"
    control.vm.provider "virtualbox" do |v|
      v.name = "Ansible control"
      v.cpus = 1
      v.memory = 512
    end
    # copy private key so hosts can ssh using key authentication (the script below sets permissions to 600)
    control.vm.provision :file do |file|
      file.source      = 'C:\Users\watrous\.vagrant.d\insecure_private_key'
      file.destination = '/home/vagrant/.ssh/id_rsa'
    end
    control.vm.provision :shell, path: "control.sh"
  end
 
  # consider using agent forwarding instead of manually copying the private key as I did above
  # config.ssh.forward_agent = true
 
end

Notice that I created a public network to get a DHCP external address. I also created a private network with assigned addresses in the open address space. This is so I can indicate to Ansible in the hosts file where to locate all of the inventory.

I had trouble getting SSH agent forwarding to work on Windows through PuTTY, so for now I’m manually placing the private key and updating known_hosts with the ‘ssh-keyscan’ command. You can see part of this in the Vagrantfile above. The remaining work is done in two scripts, one for the control and one of the subject.

control.sh

#!/usr/bin/env bash
 
# set proxy variables
#export http_proxy=http://myproxy.com:8080
#export https_proxy=https://myproxy.com:8080
 
# install pip, then use pip to install ansible
apt-get -y install python-dev python-pip
pip install ansible
 
# fix permissions on private key file
chmod 600 /home/vagrant/.ssh/id_rsa
 
# add subject host to known_hosts (IP is defined in Vagrantfile)
ssh-keyscan -H 192.168.51.4 >> /home/vagrant/.ssh/known_hosts
chown vagrant:vagrant /home/vagrant/.ssh/known_hosts
 
# create ansible hosts (inventory) file
mkdir -p /etc/ansible/
cat /vagrant/hosts >> /etc/ansible/hosts

subject.sh

#!/usr/bin/env bash
 
# fix permissions on private key file
chmod 600 /home/vagrant/.ssh/id_rsa

I also provide copy this hosts file into place on the control system so it knows against which inventory it should operate.

hosts

[targets]
localhost   ansible_connection=local
192.168.51.4    ansible_connection=ssh

After running ‘vagrant up‘, I can verify that the control box is able to access the subject box using the ping module in ansible.

vagrant-ansible

Conclusion

This post doesn’t demonstrate the use of Ansible, aside from the ping command. What it does do is provide an environment where I can build and run Ansible playbooks, which is exactly what I plan to do next.

Software Engineering

Using Vagrant to build a LEMP stack

I may have just fallen in love with the tool Vagrant. Vagrant makes it possible to quickly create a virtual environment for development. It is different than cloning or snapshots in that it uses minimal base OSes and provides a provisioning mechanism to setup and configure the environment exactly the way you want for development. I love this for a few reasons:

  • All developers work in the exact same environment
  • Developers can get a new environment up in minutes
  • Developers don’t need to be experts at setting up the environment.
  • System details can be versioned and stored alongside code

This short tutorial below demonstrates how easy it is to build a LEMP stack using Vagrant.

Install VirtualBox

Vagrant is not a virtualization tool. Instead vagrant will leverage an existing provider of virtual compute resources, either local or remote. For example, Vagrant can be used to create a virtual environment on Amazon Web Services or locally using a tool like VirtualBox. For this tutorial, we’ll use VirtualBox. You can download and install VirtualBox from the official website.

https://www.virtualbox.org/

Install Vagrant

Next, we install Vagrant. Downloads are freely available on their website.

http://www.vagrantup.com/

For the remainder of this tutorial, I’m going to assume that you’ve been through the getting started training and are somewhat familiar with Vagrant.

Accommodate SSH Keys

UPDATE 6/26/2015: Vagrant introduced the unfortunate feature of producing a random key for each new VM as the default behavior. It’s possible to restore the original functionality (described below) and use the insecure key with the config.ssh.insert_key = false setting in a Vagrantfile.

Until (if ever) Vagrant defaults to using the insecure key, a system wide work around is to add a Vagrantfile to the local .vagrant.d folder which will add set this setting for all VMs (see Load Order and Merging), unless otherwise overridden. The Vagrant file can be as simple as this:

# -*- mode: ruby -*-
# vi: set ft=ruby :
 
VAGRANTFILE_API_VERSION = "2"
 
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.ssh.insert_key = false
 
end

Vagrant creates an SSH key which it installs on guest hosts by default. This can be a huge time saver since it prevents the need for passwords. Since I use PuTTY on windows, I needed to convert the SSH key and save a PuTTY session to accommodate connections. Use PuTTYgen to do this.

  1. Open PuTTYgen
  2. Click “Load”
  3. Navigate to the file C:\Users\watrous\.vagrant.d\insecure_private_key

PuTTYgen shows a dialog saying that the import was successful and displays the details of the key, as shown here:

import-vagrant-ssh-key-puttygen

Click “Save private key”. You will be prompted about saving the key without a passphrase, which in this case is fine, since it’s just for local development. If you end up using Vagrant to create public instances, such as using Amazon Web Services, you should use a more secure connection method. Give the key a unique name, like C:\Users\watrous\.vagrant.d\insecure_private_key-putty.ppk and save.

Finally, create a saved PuTTY session to connect to new Vagrant instances. Here are some of my PuTTY settings:

putty-session-vagrant-settings-1

putty-session-vagrant-settings-auth

The username may change if you choose a different base OS image from the vagrant cloud, but the settings shown above should work fine for this tutorial.

Get Ready to ‘vagrant up’

Create a directory where you can store the files Vagrant needs to spin up your environment. I’ll refer to this directory as VAGRANT_ENV.

To build a LEMP stack we need a few things. First is a Vagrantfile file where we identify the base OS, or box, ports, etc. This is a text file that follows Ruby language conventions. Create the file VAGRANT_ENV/Vagrantfile with the following contents:

# -*- mode: ruby -*-
# vi: set ft=ruby :
 
# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"
 
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  # All Vagrant configuration is done here. The most common configuration
  # options are documented and commented below. For a complete reference,
  # please see the online documentation at vagrantup.com.
 
  # Every Vagrant virtual environment requires a box to build off of.
  config.vm.box = "ubuntu/trusty64"
  config.vm.provision :shell, path: "bootstrap.sh"
  config.vm.network :forwarded_port, host: 4567, guest: 80
  config.ssh.shell = "bash -c 'BASH_ENV=/etc/profile exec bash'"
end

This file chooses a 64 bit trusty version of Ubuntu, forwards port 4567 on the host machine to port 80 on the guest machine and identifies a bootstrap shell script, which I show next.

Create VAGRANT_ENV/bootstrap.sh with the following contents:

#!/usr/bin/env bash
 
#accommodate proxy environments
#export http_proxy=http://proxy.company.com:8080
#export https_proxy=https://proxy.company.com:8080
apt-get -y update
apt-get -y install nginx
debconf-set-selections <<< 'mysql-server mysql-server/root_password password secret'
debconf-set-selections <<< 'mysql-server mysql-server/root_password_again password secret'
apt-get -y install mysql-server
#mysql_install_db
#mysql_secure_installation
apt-get -y install php5-fpm php5-mysql
sed -i s/\;cgi\.fix_pathinfo\s*\=\s*1/cgi.fix_pathinfo\=0/ /etc/php5/fpm/php.ini
service php5-fpm restart
mv /etc/nginx/sites-available/default /etc/nginx/sites-available/default.bak
cp /vagrant/default /etc/nginx/sites-available/default
service nginx restart
echo "<?php phpinfo(); ?>" > /usr/share/nginx/html/info.php

This script executes a sequence of commands from the shell as root after provisioning the new server. This script must run without requiring user input. It also should accommodate any configuration changes and restarts necessary to get your environment ready to use.

More sophisticated tools like Ansible, Chef and Puppet can also be used.

you may have noticed that the above script expects a modified version of nginx’s default configuration. Create the file VAGRANT_ENV/default with the following contents:

server {
	listen 80 default_server;
	listen [::]:80 default_server ipv6only=on;
 
	root /usr/share/nginx/html;
	index index.php index.html index.htm;
 
	server_name localhost;
 
	location / {
		try_files $uri $uri/ =404;
	}
 
	error_page 404 /404.html;
 
	error_page 500 502 503 504 /50x.html;
	location = /50x.html {
		root /usr/share/nginx/html;
	}
 
	location ~ \.php$ {
		fastcgi_split_path_info ^(.+\.php)(/.+)$;
		fastcgi_pass unix:/var/run/php5-fpm.sock;
		fastcgi_index index.php;
		include fastcgi_params;
	}
}

vagrant up

Now it’s time to run ‘vagrant up‘. To do this, open a console window and navigate to your VAGRANT_ENV directory, then run ‘vagrant up’.

vagrant-up-console

If this is the first time you have run ‘vagrant up’, it may take a few minutes to download the ‘box’. Once it’s done, you should be ready to visit your PHP page rendered by nginx on a local virtual machine created and configured by Vagrant:

http://127.0.0.1:4567/info.php