Daniel Watrous on Software Engineering

A Collection of Software Problems and Solutions

Posts tagged virtualbox

Software Engineering

Install and configure a Multi-node Hadoop cluster using Ansible

I’ve recently been involved with several groups interested in using Hadoop to process large sets of data, including use of higher level abstractions on top of Hadoop like Pig and Hive. What has surprised me most is that no one is automating their installation of Hadoop. In each case that I’ve observed they start by manually provisioning some servers and then follow a series of tutorials to manually install and configure a cluster. The typical experience seems to take about a week to setup a cluster. There is often a lot of wasted time to deal with networking and connectivity between hosts.

After telling several groups that they should automate the installation of Hadoop using something like Ansible, I decided to create an example. All the scripts to install a new Hadoop cluster in minutes are on github for you to fork: https://github.com/dwatrous/hadoop-multi-server-ansible

I have also recorded a video demonstration of the following process:

Scope

The scope of this article is to create a three node cluster on a single computer (Windows in my case) using VirtualBox and Vagrant. The cluster includes HDFS and mapreduce running on all three nodes. The following diagram will help to visualize the cluster.

hadoop-design

Build the servers

The first step is to install VirtualBox and Vagrant.

Clone hadoop-multi-server-ansible and open a console window to the directory where you cloned. The Vagrantfile defines three Ubuntu 14.04 servers. Each server needs 3GB RAM, so you’ll need to make sure you have enough RAM available. Now run vagrant up and wait a few minutes for the new servers to come up.

C:\Users\watrous\Documents\hadoop>vagrant up
Bringing machine 'master' up with 'virtualbox' provider...
Bringing machine 'data1' up with 'virtualbox' provider...
Bringing machine 'data2' up with 'virtualbox' provider...
==> master: Importing base box 'ubuntu/trusty64'...
==> master: Matching MAC address for NAT networking...
==> master: Checking if box 'ubuntu/trusty64' is up to date...
==> master: A newer version of the box 'ubuntu/trusty64' is available! You currently
==> master: have version '20150916.0.0'. The latest is version '20150924.0.0'. Run
==> master: `vagrant box update` to update.
==> master: Setting the name of the VM: master
==> master: Clearing any previously set forwarded ports...
==> master: Clearing any previously set network interfaces...
==> master: Preparing network interfaces based on configuration...
    master: Adapter 1: nat
    master: Adapter 2: hostonly
==> master: Forwarding ports...
    master: 22 => 2222 (adapter 1)
==> master: Running 'pre-boot' VM customizations...
==> master: Booting VM...
==> master: Waiting for machine to boot. This may take a few minutes...
    master: SSH address: 127.0.0.1:2222
    master: SSH username: vagrant
    master: SSH auth method: private key
    master: Warning: Connection timeout. Retrying...
==> master: Machine booted and ready!
==> master: Checking for guest additions in VM...
==> master: Setting hostname...
==> master: Configuring and enabling network interfaces...
==> master: Mounting shared folders...
    master: /home/vagrant/src => C:/Users/watrous/Documents/hadoop
==> master: Running provisioner: file...
==> master: Running provisioner: shell...
    master: Running: C:/Users/watrous/AppData/Local/Temp/vagrant-shell20150930-12444-1lgl5bq.sh
==> master: stdin: is not a tty
==> master: Ign http://archive.ubuntu.com trusty InRelease
==> master: Ign http://archive.ubuntu.com trusty-updates InRelease
==> master: Ign http://security.ubuntu.com trusty-security InRelease
==> master: Hit http://archive.ubuntu.com trusty Release.gpg
==> master: Get:1 http://security.ubuntu.com trusty-security Release.gpg [933 B]
==> master: Get:2 http://archive.ubuntu.com trusty-updates Release.gpg [933 B]
==> master: Hit http://archive.ubuntu.com trusty Release
==> master: Get:3 http://security.ubuntu.com trusty-security Release [63.5 kB]
==> master: Get:4 http://archive.ubuntu.com trusty-updates Release [63.5 kB]
==> master: Get:5 http://archive.ubuntu.com trusty/main Sources [1,064 kB]
==> master: Get:6 http://security.ubuntu.com trusty-security/main Sources [96.2 kB]
==> master: Get:7 http://security.ubuntu.com trusty-security/universe Sources [31.1 kB]
==> master: Get:8 http://security.ubuntu.com trusty-security/main amd64 Packages [350 kB]
==> master: Get:9 http://archive.ubuntu.com trusty/universe Sources [6,399 kB]
==> master: Get:10 http://security.ubuntu.com trusty-security/universe amd64 Packages [117 kB]
==> master: Get:11 http://security.ubuntu.com trusty-security/main Translation-en [191 kB]
==> master: Get:12 http://security.ubuntu.com trusty-security/universe Translation-en [68.2 kB]
==> master: Hit http://archive.ubuntu.com trusty/main amd64 Packages
==> master: Hit http://archive.ubuntu.com trusty/universe amd64 Packages
==> master: Hit http://archive.ubuntu.com trusty/main Translation-en
==> master: Hit http://archive.ubuntu.com trusty/universe Translation-en
==> master: Get:13 http://archive.ubuntu.com trusty-updates/main Sources [236 kB]
==> master: Get:14 http://archive.ubuntu.com trusty-updates/universe Sources [139 kB]
==> master: Get:15 http://archive.ubuntu.com trusty-updates/main amd64 Packages [626 kB]
==> master: Get:16 http://archive.ubuntu.com trusty-updates/universe amd64 Packages [320 kB]
==> master: Get:17 http://archive.ubuntu.com trusty-updates/main Translation-en [304 kB]
==> master: Get:18 http://archive.ubuntu.com trusty-updates/universe Translation-en [168 kB]
==> master: Ign http://archive.ubuntu.com trusty/main Translation-en_US
==> master: Ign http://archive.ubuntu.com trusty/universe Translation-en_US
==> master: Fetched 10.2 MB in 4s (2,098 kB/s)
==> master: Reading package lists...
==> master: Reading package lists...
==> master: Building dependency tree...
==> master:
==> master: Reading state information...
==> master: The following extra packages will be installed:
==> master:   build-essential dpkg-dev g++ g++-4.8 libalgorithm-diff-perl
==> master:   libalgorithm-diff-xs-perl libalgorithm-merge-perl libdpkg-perl libexpat1-dev
==> master:   libfile-fcntllock-perl libpython-dev libpython2.7-dev libstdc++-4.8-dev
==> master:   python-chardet-whl python-colorama python-colorama-whl python-distlib
==> master:   python-distlib-whl python-html5lib python-html5lib-whl python-pip-whl
==> master:   python-requests-whl python-setuptools python-setuptools-whl python-six-whl
==> master:   python-urllib3-whl python-wheel python2.7-dev python3-pkg-resources
==> master: Suggested packages:
==> master:   debian-keyring g++-multilib g++-4.8-multilib gcc-4.8-doc libstdc++6-4.8-dbg
==> master:   libstdc++-4.8-doc python-genshi python-lxml python3-setuptools zip
==> master: Recommended packages:
==> master:   python-dev-all
==> master: The following NEW packages will be installed:
==> master:   build-essential dpkg-dev g++ g++-4.8 libalgorithm-diff-perl
==> master:   libalgorithm-diff-xs-perl libalgorithm-merge-perl libdpkg-perl libexpat1-dev
==> master:   libfile-fcntllock-perl libpython-dev libpython2.7-dev libstdc++-4.8-dev
==> master:   python-chardet-whl python-colorama python-colorama-whl python-dev
==> master:   python-distlib python-distlib-whl python-html5lib python-html5lib-whl
==> master:   python-pip python-pip-whl python-requests-whl python-setuptools
==> master:   python-setuptools-whl python-six-whl python-urllib3-whl python-wheel
==> master:   python2.7-dev python3-pkg-resources unzip
==> master: 0 upgraded, 32 newly installed, 0 to remove and 29 not upgraded.
==> master: Need to get 41.3 MB of archives.
==> master: After this operation, 80.4 MB of additional disk space will be used.
==> master: Get:1 http://archive.ubuntu.com/ubuntu/ trusty-updates/main libexpat1-dev amd64 2.1.0-4ubuntu1.1 [115 kB]
==> master: Get:2 http://archive.ubuntu.com/ubuntu/ trusty-updates/main libpython2.7-dev amd64 2.7.6-8ubuntu0.2 [22.0 MB]
==> master: Get:3 http://archive.ubuntu.com/ubuntu/ trusty-updates/main libstdc++-4.8-dev amd64 4.8.4-2ubuntu1~14.04 [1,052 kB]
==> master: Get:4 http://archive.ubuntu.com/ubuntu/ trusty-updates/main g++-4.8 amd64 4.8.4-2ubuntu1~14.04 [15.0 MB]
==> master: Get:5 http://archive.ubuntu.com/ubuntu/ trusty/main g++ amd64 4:4.8.2-1ubuntu6 [1,490 B]
==> master: Get:6 http://archive.ubuntu.com/ubuntu/ trusty-updates/main libdpkg-perl all 1.17.5ubuntu5.4 [179 kB]
==> master: Get:7 http://archive.ubuntu.com/ubuntu/ trusty-updates/main dpkg-dev all 1.17.5ubuntu5.4 [726 kB]
==> master: Get:8 http://archive.ubuntu.com/ubuntu/ trusty/main build-essential amd64 11.6ubuntu6 [4,838 B]
==> master: Get:9 http://archive.ubuntu.com/ubuntu/ trusty/main libalgorithm-diff-perl all 1.19.02-3 [50.0 kB]
==> master: Get:10 http://archive.ubuntu.com/ubuntu/ trusty/main libalgorithm-diff-xs-perl amd64 0.04-2build4 [12.6 kB]
==> master: Get:11 http://archive.ubuntu.com/ubuntu/ trusty/main libalgorithm-merge-perl all 0.08-2 [12.7 kB]
==> master: Get:12 http://archive.ubuntu.com/ubuntu/ trusty/main libfile-fcntllock-perl amd64 0.14-2build1 [15.9 kB]
==> master: Get:13 http://archive.ubuntu.com/ubuntu/ trusty/main libpython-dev amd64 2.7.5-5ubuntu3 [7,078 B]
==> master: Get:14 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python3-pkg-resources all 3.3-1ubuntu2 [31.7 kB]
==> master: Get:15 http://archive.ubuntu.com/ubuntu/ trusty-updates/universe python-chardet-whl all 2.2.1-2~ubuntu1 [170 kB]
==> master: Get:16 http://archive.ubuntu.com/ubuntu/ trusty-updates/universe python-colorama all 0.2.5-0.1ubuntu2 [18.4 kB]
==> master: Get:17 http://archive.ubuntu.com/ubuntu/ trusty-updates/universe python-colorama-whl all 0.2.5-0.1ubuntu2 [18.2 kB]
==> master: Get:18 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python2.7-dev amd64 2.7.6-8ubuntu0.2 [269 kB]
==> master: Get:19 http://archive.ubuntu.com/ubuntu/ trusty/main python-dev amd64 2.7.5-5ubuntu3 [1,166 B]
==> master: Get:20 http://archive.ubuntu.com/ubuntu/ trusty-updates/universe python-distlib all 0.1.8-1ubuntu1 [113 kB]
==> master: Get:21 http://archive.ubuntu.com/ubuntu/ trusty-updates/universe python-distlib-whl all 0.1.8-1ubuntu1 [140 kB]
==> master: Get:22 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python-html5lib all 0.999-3~ubuntu1 [83.5 kB]
==> master: Get:23 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python-html5lib-whl all 0.999-3~ubuntu1 [109 kB]
==> master: Get:24 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python-six-whl all 1.5.2-1ubuntu1 [10.5 kB]
==> master: Get:25 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python-urllib3-whl all 1.7.1-1ubuntu3 [64.0 kB]
==> master: Get:26 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python-requests-whl all 2.2.1-1ubuntu0.3 [227
kB]
==> master: Get:27 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python-setuptools-whl all 3.3-1ubuntu2 [244 kB]
==> master: Get:28 http://archive.ubuntu.com/ubuntu/ trusty-updates/universe python-pip-whl all 1.5.4-1ubuntu3 [111 kB]
==> master: Get:29 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python-setuptools all 3.3-1ubuntu2 [230 kB]
==> master: Get:30 http://archive.ubuntu.com/ubuntu/ trusty-updates/universe python-pip all 1.5.4-1ubuntu3 [97.2 kB]
==> master: Get:31 http://archive.ubuntu.com/ubuntu/ trusty-updates/main python-wheel all 0.24.0-1~ubuntu1 [44.7 kB]
==> master: Get:32 http://archive.ubuntu.com/ubuntu/ trusty-updates/main unzip amd64 6.0-9ubuntu1.3 [157 kB]
==> master: dpkg-preconfigure: unable to re-open stdin: No such file or directory
==> master: Fetched 41.3 MB in 20s (2,027 kB/s)
==> master: Selecting previously unselected package libexpat1-dev:amd64.
==> master: (Reading database ... 61002 files and directories currently installed.)
==> master: Preparing to unpack .../libexpat1-dev_2.1.0-4ubuntu1.1_amd64.deb ...
==> master: Unpacking libexpat1-dev:amd64 (2.1.0-4ubuntu1.1) ...
==> master: Selecting previously unselected package libpython2.7-dev:amd64.
==> master: Preparing to unpack .../libpython2.7-dev_2.7.6-8ubuntu0.2_amd64.deb ...
==> master: Unpacking libpython2.7-dev:amd64 (2.7.6-8ubuntu0.2) ...
==> master: Selecting previously unselected package libstdc++-4.8-dev:amd64.
==> master: Preparing to unpack .../libstdc++-4.8-dev_4.8.4-2ubuntu1~14.04_amd64.deb ...
==> master: Unpacking libstdc++-4.8-dev:amd64 (4.8.4-2ubuntu1~14.04) ...
==> master: Selecting previously unselected package g++-4.8.
==> master: Preparing to unpack .../g++-4.8_4.8.4-2ubuntu1~14.04_amd64.deb ...
==> master: Unpacking g++-4.8 (4.8.4-2ubuntu1~14.04) ...
==> master: Selecting previously unselected package g++.
==> master: Preparing to unpack .../g++_4%3a4.8.2-1ubuntu6_amd64.deb ...
==> master: Unpacking g++ (4:4.8.2-1ubuntu6) ...
==> master: Selecting previously unselected package libdpkg-perl.
==> master: Preparing to unpack .../libdpkg-perl_1.17.5ubuntu5.4_all.deb ...
==> master: Unpacking libdpkg-perl (1.17.5ubuntu5.4) ...
==> master: Selecting previously unselected package dpkg-dev.
==> master: Preparing to unpack .../dpkg-dev_1.17.5ubuntu5.4_all.deb ...
==> master: Unpacking dpkg-dev (1.17.5ubuntu5.4) ...
==> master: Selecting previously unselected package build-essential.
==> master: Preparing to unpack .../build-essential_11.6ubuntu6_amd64.deb ...
==> master: Unpacking build-essential (11.6ubuntu6) ...
==> master: Selecting previously unselected package libalgorithm-diff-perl.
==> master: Preparing to unpack .../libalgorithm-diff-perl_1.19.02-3_all.deb ...
==> master: Unpacking libalgorithm-diff-perl (1.19.02-3) ...
==> master: Selecting previously unselected package libalgorithm-diff-xs-perl.
==> master: Preparing to unpack .../libalgorithm-diff-xs-perl_0.04-2build4_amd64.deb ...
==> master: Unpacking libalgorithm-diff-xs-perl (0.04-2build4) ...
==> master: Selecting previously unselected package libalgorithm-merge-perl.
==> master: Preparing to unpack .../libalgorithm-merge-perl_0.08-2_all.deb ...
==> master: Unpacking libalgorithm-merge-perl (0.08-2) ...
==> master: Selecting previously unselected package libfile-fcntllock-perl.
==> master: Preparing to unpack .../libfile-fcntllock-perl_0.14-2build1_amd64.deb ...
==> master: Unpacking libfile-fcntllock-perl (0.14-2build1) ...
==> master: Selecting previously unselected package libpython-dev:amd64.
==> master: Preparing to unpack .../libpython-dev_2.7.5-5ubuntu3_amd64.deb ...
==> master: Unpacking libpython-dev:amd64 (2.7.5-5ubuntu3) ...
==> master: Selecting previously unselected package python3-pkg-resources.
==> master: Preparing to unpack .../python3-pkg-resources_3.3-1ubuntu2_all.deb ...
==> master: Unpacking python3-pkg-resources (3.3-1ubuntu2) ...
==> master: Selecting previously unselected package python-chardet-whl.
==> master: Preparing to unpack .../python-chardet-whl_2.2.1-2~ubuntu1_all.deb ...
==> master: Unpacking python-chardet-whl (2.2.1-2~ubuntu1) ...
==> master: Selecting previously unselected package python-colorama.
==> master: Preparing to unpack .../python-colorama_0.2.5-0.1ubuntu2_all.deb ...
==> master: Unpacking python-colorama (0.2.5-0.1ubuntu2) ...
==> master: Selecting previously unselected package python-colorama-whl.
==> master: Preparing to unpack .../python-colorama-whl_0.2.5-0.1ubuntu2_all.deb ...
==> master: Unpacking python-colorama-whl (0.2.5-0.1ubuntu2) ...
==> master: Selecting previously unselected package python2.7-dev.
==> master: Preparing to unpack .../python2.7-dev_2.7.6-8ubuntu0.2_amd64.deb ...
==> master: Unpacking python2.7-dev (2.7.6-8ubuntu0.2) ...
==> master: Selecting previously unselected package python-dev.
==> master: Preparing to unpack .../python-dev_2.7.5-5ubuntu3_amd64.deb ...
==> master: Unpacking python-dev (2.7.5-5ubuntu3) ...
==> master: Selecting previously unselected package python-distlib.
==> master: Preparing to unpack .../python-distlib_0.1.8-1ubuntu1_all.deb ...
==> master: Unpacking python-distlib (0.1.8-1ubuntu1) ...
==> master: Selecting previously unselected package python-distlib-whl.
==> master: Preparing to unpack .../python-distlib-whl_0.1.8-1ubuntu1_all.deb ...
==> master: Unpacking python-distlib-whl (0.1.8-1ubuntu1) ...
==> master: Selecting previously unselected package python-html5lib.
==> master: Preparing to unpack .../python-html5lib_0.999-3~ubuntu1_all.deb ...
==> master: Unpacking python-html5lib (0.999-3~ubuntu1) ...
==> master: Selecting previously unselected package python-html5lib-whl.
==> master: Preparing to unpack .../python-html5lib-whl_0.999-3~ubuntu1_all.deb ...
==> master: Unpacking python-html5lib-whl (0.999-3~ubuntu1) ...
==> master: Selecting previously unselected package python-six-whl.
==> master: Preparing to unpack .../python-six-whl_1.5.2-1ubuntu1_all.deb ...
==> master: Unpacking python-six-whl (1.5.2-1ubuntu1) ...
==> master: Selecting previously unselected package python-urllib3-whl.
==> master: Preparing to unpack .../python-urllib3-whl_1.7.1-1ubuntu3_all.deb ...
==> master: Unpacking python-urllib3-whl (1.7.1-1ubuntu3) ...
==> master: Selecting previously unselected package python-requests-whl.
==> master: Preparing to unpack .../python-requests-whl_2.2.1-1ubuntu0.3_all.deb ...
==> master: Unpacking python-requests-whl (2.2.1-1ubuntu0.3) ...
==> master: Selecting previously unselected package python-setuptools-whl.
==> master: Preparing to unpack .../python-setuptools-whl_3.3-1ubuntu2_all.deb ...
==> master: Unpacking python-setuptools-whl (3.3-1ubuntu2) ...
==> master: Selecting previously unselected package python-pip-whl.
==> master: Preparing to unpack .../python-pip-whl_1.5.4-1ubuntu3_all.deb ...
==> master: Unpacking python-pip-whl (1.5.4-1ubuntu3) ...
==> master: Selecting previously unselected package python-setuptools.
==> master: Preparing to unpack .../python-setuptools_3.3-1ubuntu2_all.deb ...
==> master: Unpacking python-setuptools (3.3-1ubuntu2) ...
==> master: Selecting previously unselected package python-pip.
==> master: Preparing to unpack .../python-pip_1.5.4-1ubuntu3_all.deb ...
==> master: Unpacking python-pip (1.5.4-1ubuntu3) ...
==> master: Selecting previously unselected package python-wheel.
==> master: Preparing to unpack .../python-wheel_0.24.0-1~ubuntu1_all.deb ...
==> master: Unpacking python-wheel (0.24.0-1~ubuntu1) ...
==> master: Selecting previously unselected package unzip.
==> master: Preparing to unpack .../unzip_6.0-9ubuntu1.3_amd64.deb ...
==> master: Unpacking unzip (6.0-9ubuntu1.3) ...
==> master: Processing triggers for man-db (2.6.7.1-1ubuntu1) ...
==> master: Processing triggers for mime-support (3.54ubuntu1.1) ...
==> master: Setting up libexpat1-dev:amd64 (2.1.0-4ubuntu1.1) ...
==> master: Setting up libpython2.7-dev:amd64 (2.7.6-8ubuntu0.2) ...
==> master: Setting up libstdc++-4.8-dev:amd64 (4.8.4-2ubuntu1~14.04) ...
==> master: Setting up g++-4.8 (4.8.4-2ubuntu1~14.04) ...
==> master: Setting up g++ (4:4.8.2-1ubuntu6) ...
==> master: update-alternatives: using /usr/bin/g++ to provide /usr/bin/c++ (c++) in auto mode
==> master: Setting up libdpkg-perl (1.17.5ubuntu5.4) ...
==> master: Setting up dpkg-dev (1.17.5ubuntu5.4) ...
==> master: Setting up build-essential (11.6ubuntu6) ...
==> master: Setting up libalgorithm-diff-perl (1.19.02-3) ...
==> master: Setting up libalgorithm-diff-xs-perl (0.04-2build4) ...
==> master: Setting up libalgorithm-merge-perl (0.08-2) ...
==> master: Setting up libfile-fcntllock-perl (0.14-2build1) ...
==> master: Setting up libpython-dev:amd64 (2.7.5-5ubuntu3) ...
==> master: Setting up python3-pkg-resources (3.3-1ubuntu2) ...
==> master: Setting up python-chardet-whl (2.2.1-2~ubuntu1) ...
==> master: Setting up python-colorama (0.2.5-0.1ubuntu2) ...
==> master: Setting up python-colorama-whl (0.2.5-0.1ubuntu2) ...
==> master: Setting up python2.7-dev (2.7.6-8ubuntu0.2) ...
==> master: Setting up python-dev (2.7.5-5ubuntu3) ...
==> master: Setting up python-distlib (0.1.8-1ubuntu1) ...
==> master: Setting up python-distlib-whl (0.1.8-1ubuntu1) ...
==> master: Setting up python-html5lib (0.999-3~ubuntu1) ...
==> master: Setting up python-html5lib-whl (0.999-3~ubuntu1) ...
==> master: Setting up python-six-whl (1.5.2-1ubuntu1) ...
==> master: Setting up python-urllib3-whl (1.7.1-1ubuntu3) ...
==> master: Setting up python-requests-whl (2.2.1-1ubuntu0.3) ...
==> master: Setting up python-setuptools-whl (3.3-1ubuntu2) ...
==> master: Setting up python-pip-whl (1.5.4-1ubuntu3) ...
==> master: Setting up python-setuptools (3.3-1ubuntu2) ...
==> master: Setting up python-pip (1.5.4-1ubuntu3) ...
==> master: Setting up python-wheel (0.24.0-1~ubuntu1) ...
==> master: Setting up unzip (6.0-9ubuntu1.3) ...
==> master: Downloading/unpacking ansible
==> master:   Running setup.py (path:/tmp/pip_build_root/ansible/setup.py) egg_info for package ansible
==> master:
==> master:     no previously-included directories found matching 'v2'
==> master:     no previously-included directories found matching 'docsite'
==> master:     no previously-included directories found matching 'ticket_stubs'
==> master:     no previously-included directories found matching 'packaging'
==> master:     no previously-included directories found matching 'test'
==> master:     no previously-included directories found matching 'hacking'
==> master:     no previously-included directories found matching 'lib/ansible/modules/core/.git'
==> master:     no previously-included directories found matching 'lib/ansible/modules/extras/.git'
==> master: Downloading/unpacking paramiko (from ansible)
==> master: Downloading/unpacking jinja2 (from ansible)
==> master: Requirement already satisfied (use --upgrade to upgrade): PyYAML in /usr/lib/python2.7/dist-packages (from
ansible)
==> master: Requirement already satisfied (use --upgrade to upgrade): setuptools in /usr/lib/python2.7/dist-packages (from ansible)
==> master: Requirement already satisfied (use --upgrade to upgrade): pycrypto>=2.6 in /usr/lib/python2.7/dist-packages (from ansible)
==> master: Downloading/unpacking ecdsa>=0.11 (from paramiko->ansible)
==> master: Downloading/unpacking MarkupSafe (from jinja2->ansible)
==> master:   Downloading MarkupSafe-0.23.tar.gz
==> master:   Running setup.py (path:/tmp/pip_build_root/MarkupSafe/setup.py) egg_info for package MarkupSafe
==> master:
==> master: Installing collected packages: ansible, paramiko, jinja2, ecdsa, MarkupSafe
==> master:   Running setup.py install for ansible
==> master:     changing mode of build/scripts-2.7/ansible from 644 to 755
==> master:     changing mode of build/scripts-2.7/ansible-playbook from 644 to 755
==> master:     changing mode of build/scripts-2.7/ansible-pull from 644 to 755
==> master:     changing mode of build/scripts-2.7/ansible-doc from 644 to 755
==> master:     changing mode of build/scripts-2.7/ansible-galaxy from 644 to 755
==> master:     changing mode of build/scripts-2.7/ansible-vault from 644 to 755
==> master:
==> master:     no previously-included directories found matching 'v2'
==> master:     no previously-included directories found matching 'docsite'
==> master:     no previously-included directories found matching 'ticket_stubs'
==> master:     no previously-included directories found matching 'test'
==> master:     no previously-included directories found matching 'hacking'
==> master:     no previously-included directories found matching 'lib/ansible/modules/core/.git'
==> master:     no previously-included directories found matching 'lib/ansible/modules/extras/.git'
==> master:     changing mode of /usr/local/bin/ansible-galaxy to 755
==> master:     changing mode of /usr/local/bin/ansible-playbook to 755
==> master:     changing mode of /usr/local/bin/ansible-doc to 755
==> master:     changing mode of /usr/local/bin/ansible-pull to 755
==> master:     changing mode of /usr/local/bin/ansible-vault to 755
==> master:     changing mode of /usr/local/bin/ansible to 755
==> master:   Running setup.py install for MarkupSafe
==> master:
==> master:     building 'markupsafe._speedups' extension
==> master:     x86_64-linux-gnu-gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c markupsafe/_speedups.c -o build/temp.linux-x86_64-2.7/markupsafe/_speedups.o
==> master:     x86_64-linux-gnu-gcc -pthread -shared -Wl,-O1 -Wl,-Bsymbolic-functions -Wl,-Bsymbolic-functions -Wl,-z,relro -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security build/temp.linux-x86_64-2.7/markupsafe/_speedups.o -o build/lib.linux-x86_64-2.7/markupsafe/_speedups.so
==> master: Successfully installed ansible paramiko jinja2 ecdsa MarkupSafe
==> master: Cleaning up...
==> master: # 192.168.51.4 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
==> master: # 192.168.51.4 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
==> master: read (192.168.51.6): No route to host
==> master: read (192.168.51.6): No route to host
==> data1: Importing base box 'ubuntu/trusty64'...
==> data1: Matching MAC address for NAT networking...
==> data1: Checking if box 'ubuntu/trusty64' is up to date...
==> data1: A newer version of the box 'ubuntu/trusty64' is available! You currently
==> data1: have version '20150916.0.0'. The latest is version '20150924.0.0'. Run
==> data1: `vagrant box update` to update.
==> data1: Setting the name of the VM: data1
==> data1: Clearing any previously set forwarded ports...
==> data1: Fixed port collision for 22 => 2222. Now on port 2200.
==> data1: Clearing any previously set network interfaces...
==> data1: Preparing network interfaces based on configuration...
    data1: Adapter 1: nat
    data1: Adapter 2: hostonly
==> data1: Forwarding ports...
    data1: 22 => 2200 (adapter 1)
==> data1: Running 'pre-boot' VM customizations...
==> data1: Booting VM...
==> data1: Waiting for machine to boot. This may take a few minutes...
    data1: SSH address: 127.0.0.1:2200
    data1: SSH username: vagrant
    data1: SSH auth method: private key
    data1: Warning: Connection timeout. Retrying...
==> data1: Machine booted and ready!
==> data1: Checking for guest additions in VM...
==> data1: Setting hostname...
==> data1: Configuring and enabling network interfaces...
==> data1: Mounting shared folders...
    data1: /vagrant => C:/Users/watrous/Documents/hadoop
==> data1: Running provisioner: file...
==> data2: Importing base box 'ubuntu/trusty64'...
==> data2: Matching MAC address for NAT networking...
==> data2: Checking if box 'ubuntu/trusty64' is up to date...
==> data2: A newer version of the box 'ubuntu/trusty64' is available! You currently
==> data2: have version '20150916.0.0'. The latest is version '20150924.0.0'. Run
==> data2: `vagrant box update` to update.
==> data2: Setting the name of the VM: data2
==> data2: Clearing any previously set forwarded ports...
==> data2: Fixed port collision for 22 => 2222. Now on port 2201.
==> data2: Clearing any previously set network interfaces...
==> data2: Preparing network interfaces based on configuration...
    data2: Adapter 1: nat
    data2: Adapter 2: hostonly
==> data2: Forwarding ports...
    data2: 22 => 2201 (adapter 1)
==> data2: Running 'pre-boot' VM customizations...
==> data2: Booting VM...
==> data2: Waiting for machine to boot. This may take a few minutes...
    data2: SSH address: 127.0.0.1:2201
    data2: SSH username: vagrant
    data2: SSH auth method: private key
    data2: Warning: Connection timeout. Retrying...
==> data2: Machine booted and ready!
==> data2: Checking for guest additions in VM...
==> data2: Setting hostname...
==> data2: Configuring and enabling network interfaces...
==> data2: Mounting shared folders...
    data2: /vagrant => C:/Users/watrous/Documents/hadoop
==> data2: Running provisioner: file...

Shown in the output above is the bootstrap-master.sh script installing ansible and other required libraries. At this point all three servers are ready for Hadoop to be installed and your VirtualBox console would look something like this:

virtualbox-hadoop-hosts

Limit to a single datanode

If you are low on RAM, you can make a couple of small changes to install only two servers with the same effect. To do this change the following files.

  • Vagrantfile: Remove or comment the definition of the unwanted datanode
  • group_vars/all: Remove or comment the unused host
  • hosts-dev: Remove or comment the unused host

Conversely it is possible to add as many datanodes as you like by modifying the same files above. Those changes will trickle through to as many hosts as you define. I’ll discuss that more in a future post when we use this same Ansible scripts to deploy to a cloud provider.

Install Hadoop

It’s now time to install Hadoop. There are several commented lines in the bootstrap-master.sh script that you can copy and paste to perform the next few steps. The easiest is to login to the hadoop-master server and run the ansible playbook.

Proxy management

If you happen to be behind a proxy then you’ll need to make sure that you update the proxy settings in bootstrap-master.sh and group_vars/all. For the group_vars, if you don’t have a proxy, just leave the none: false setting in place, otherwise the ansible playbook will fail since it’s expecting that to be a dictionary.

Run the Ansible playbook

Below you can see the Ansible output from configuring and installing Hadoop and all its dependencies on all three servers in your new cluster.

vagrant@hadoop-master:~$ cd src/
vagrant@hadoop-master:~/src$ ansible-playbook -i hosts-dev playbook.yml
 
PLAY [Install hadoop master node] *********************************************
 
GATHERING FACTS ***************************************************************
ok: [192.168.51.4]
 
TASK: [common | group name=hadoop state=present] ******************************
changed: [192.168.51.4]
 
TASK: [common | user name=hadoop comment="Hadoop" group=hadoop shell=/bin/bash] ***
changed: [192.168.51.4]
 
TASK: [common | authorized_key user=hadoop key="ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDWeJfgWx7hDeZUJOeaIVzcbmYxzMcWfxhgC2975tvGL5BV6unzLz8ZVak6ju++AvnM5mcQp6Ydv73uWyaoQaFZigAzfuenruQkwc7D5YYuba+FgZdQ8VHon29oQA3iaZWG7xTspagrfq3fcqaz2ZIjzqN+E/MtcW08PwfibN2QRWchBCuZ1Q8AmrW7gClzMcgd/uj3TstabspGaaZMCs8aC9JWzZlMMegXKYHvVQs6xH2AmifpKpLoMTdO8jP4jczmGebPzvaXmvVylgwo6bRJ3tyYAmGwx8PHj2EVVQ0XX9ipgixLyAa2c7+/crPpGmKFRrYibCCT6x65px7nWnn3"] ***
changed: [192.168.51.4]
 
TASK: [common | unpack hadoop] ************************************************
changed: [192.168.51.4]
 
TASK: [common | command mv /usr/local/hadoop-2.7.1 /usr/local/hadoop creates=/usr/local/hadoop removes=/usr/local/hadoop-2.7.1] ***
changed: [192.168.51.4]
 
TASK: [common | lineinfile dest=/home/hadoop/.bashrc regexp="HADOOP_HOME=" line="export HADOOP_HOME=/usr/local/hadoop"] ***
changed: [192.168.51.4]
 
TASK: [common | lineinfile dest=/home/hadoop/.bashrc regexp="PATH=" line="export PATH=$PATH:$HADOOP_HOME/bin"] ***
changed: [192.168.51.4]
 
TASK: [common | lineinfile dest=/home/hadoop/.bashrc regexp="HADOOP_SSH_OPTS=" line="export HADOOP_SSH_OPTS=\"-i /home/hadoop/.ssh/hadoop_rsa\""] ***
changed: [192.168.51.4]
 
TASK: [common | Build hosts file] *********************************************
changed: [192.168.51.4] => (item={'ip': '192.168.51.4', 'hostname': 'hadoop-master'})
changed: [192.168.51.4] => (item={'ip': '192.168.51.5', 'hostname': 'hadoop-data1'})
changed: [192.168.51.4] => (item={'ip': '192.168.51.6', 'hostname': 'hadoop-data2'})
 
TASK: [common | lineinfile dest=/etc/hosts regexp='127.0.1.1' state=absent] ***
changed: [192.168.51.4]
 
TASK: [common | file path=/home/hadoop/tmp state=directory owner=hadoop group=hadoop mode=750] ***
changed: [192.168.51.4]
 
TASK: [common | file path=/home/hadoop/hadoop-data/hdfs/namenode state=directory owner=hadoop group=hadoop mode=750] ***
changed: [192.168.51.4]
 
TASK: [common | file path=/home/hadoop/hadoop-data/hdfs/datanode state=directory owner=hadoop group=hadoop mode=750] ***
changed: [192.168.51.4]
 
TASK: [common | Add the service scripts] **************************************
changed: [192.168.51.4] => (item={'dest': '/usr/local/hadoop/etc/hadoop/core-site.xml', 'src': 'core-site.xml'})
changed: [192.168.51.4] => (item={'dest': '/usr/local/hadoop/etc/hadoop/hdfs-site.xml', 'src': 'hdfs-site.xml'})
changed: [192.168.51.4] => (item={'dest': '/usr/local/hadoop/etc/hadoop/yarn-site.xml', 'src': 'yarn-site.xml'})
changed: [192.168.51.4] => (item={'dest': '/usr/local/hadoop/etc/hadoop/mapred-site.xml', 'src': 'mapred-site.xml'})
 
TASK: [common | lineinfile dest=/usr/local/hadoop/etc/hadoop/hadoop-env.sh regexp="^export JAVA_HOME" line="export JAVA_HOME=/usr/lib/jvm/java-8-oracle"] ***
changed: [192.168.51.4]
 
TASK: [common | ensure hostkeys is a known host] ******************************
# hadoop-master SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.4] => (item={'ip': '192.168.51.4', 'hostname': 'hadoop-master'})
# hadoop-data1 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.4] => (item={'ip': '192.168.51.5', 'hostname': 'hadoop-data1'})
# hadoop-data2 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.4] => (item={'ip': '192.168.51.6', 'hostname': 'hadoop-data2'})
 
TASK: [oraclejava8 | apt_repository repo='deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main' state=present] ***
changed: [192.168.51.4]
 
TASK: [oraclejava8 | apt_repository repo='deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main' state=present] ***
changed: [192.168.51.4]
 
TASK: [oraclejava8 | debconf name='oracle-java8-installer' question='shared/accepted-oracle-license-v1-1' value='true' vtype='select' unseen=false] ***
changed: [192.168.51.4]
 
TASK: [oraclejava8 | apt_key keyserver=keyserver.ubuntu.com id=EEA14886] ******
changed: [192.168.51.4]
 
TASK: [oraclejava8 | Install Java] ********************************************
changed: [192.168.51.4]
 
TASK: [oraclejava8 | lineinfile dest=/home/hadoop/.bashrc regexp="^export JAVA_HOME" line="export JAVA_HOME=/usr/lib/jvm/java-8-oracle"] ***
changed: [192.168.51.4]
 
TASK: [master | Copy private key into place] **********************************
changed: [192.168.51.4]
 
TASK: [master | Copy slaves into place] ***************************************
changed: [192.168.51.4]
 
TASK: [master | prepare known_hosts] ******************************************
# 192.168.51.4 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.4] => (item={'ip': '192.168.51.4', 'hostname': 'hadoop-master'})
# 192.168.51.5 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.4] => (item={'ip': '192.168.51.5', 'hostname': 'hadoop-data1'})
# 192.168.51.6 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.4] => (item={'ip': '192.168.51.6', 'hostname': 'hadoop-data2'})
 
TASK: [master | add 0.0.0.0 to known_hosts for secondary namenode] ************
# 0.0.0.0 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.4]
 
PLAY [Install hadoop data nodes] **********************************************
 
GATHERING FACTS ***************************************************************
ok: [192.168.51.5]
ok: [192.168.51.6]
 
TASK: [common | group name=hadoop state=present] ******************************
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [common | user name=hadoop comment="Hadoop" group=hadoop shell=/bin/bash] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [common | authorized_key user=hadoop key="ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDWeJfgWx7hDeZUJOeaIVzcbmYxzMcWfxhgC2975tvGL5BV6unzLz8ZVak6ju++AvnM5mcQp6Ydv73uWyaoQaFZigAzfuenruQkwc7D5YYuba+FgZdQ8VHon29oQA3iaZWG7xTspagrfq3fcqaz2ZIjzqN+E/MtcW08PwfibN2QRWchBCuZ1Q8AmrW7gClzMcgd/uj3TstabspGaaZMCs8aC9JWzZlMMegXKYHvVQs6xH2AmifpKpLoMTdO8jP4jczmGebPzvaXmvVylgwo6bRJ3tyYAmGwx8PHj2EVVQ0XX9ipgixLyAa2c7+/crPpGmKFRrYibCCT6x65px7nWnn3"] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [common | unpack hadoop] ************************************************
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [common | command mv /usr/local/hadoop-2.7.1 /usr/local/hadoop creates=/usr/local/hadoop removes=/usr/local/hadoop-2.7.1] ***
changed: [192.168.51.6]
changed: [192.168.51.5]
 
TASK: [common | lineinfile dest=/home/hadoop/.bashrc regexp="HADOOP_HOME=" line="export HADOOP_HOME=/usr/local/hadoop"] ***
changed: [192.168.51.6]
changed: [192.168.51.5]
 
TASK: [common | lineinfile dest=/home/hadoop/.bashrc regexp="PATH=" line="export PATH=$PATH:$HADOOP_HOME/bin"] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [common | lineinfile dest=/home/hadoop/.bashrc regexp="HADOOP_SSH_OPTS=" line="export HADOOP_SSH_OPTS=\"-i /home/hadoop/.ssh/hadoop_rsa\""] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [common | Build hosts file] *********************************************
changed: [192.168.51.5] => (item={'ip': '192.168.51.4', 'hostname': 'hadoop-master'})
changed: [192.168.51.6] => (item={'ip': '192.168.51.4', 'hostname': 'hadoop-master'})
changed: [192.168.51.5] => (item={'ip': '192.168.51.5', 'hostname': 'hadoop-data1'})
changed: [192.168.51.6] => (item={'ip': '192.168.51.5', 'hostname': 'hadoop-data1'})
changed: [192.168.51.5] => (item={'ip': '192.168.51.6', 'hostname': 'hadoop-data2'})
changed: [192.168.51.6] => (item={'ip': '192.168.51.6', 'hostname': 'hadoop-data2'})
 
TASK: [common | lineinfile dest=/etc/hosts regexp='127.0.1.1' state=absent] ***
changed: [192.168.51.6]
changed: [192.168.51.5]
 
TASK: [common | file path=/home/hadoop/tmp state=directory owner=hadoop group=hadoop mode=750] ***
changed: [192.168.51.6]
changed: [192.168.51.5]
 
TASK: [common | file path=/home/hadoop/hadoop-data/hdfs/namenode state=directory owner=hadoop group=hadoop mode=750] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [common | file path=/home/hadoop/hadoop-data/hdfs/datanode state=directory owner=hadoop group=hadoop mode=750] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [common | Add the service scripts] **************************************
changed: [192.168.51.5] => (item={'dest': '/usr/local/hadoop/etc/hadoop/core-site.xml', 'src': 'core-site.xml'})
changed: [192.168.51.6] => (item={'dest': '/usr/local/hadoop/etc/hadoop/core-site.xml', 'src': 'core-site.xml'})
changed: [192.168.51.5] => (item={'dest': '/usr/local/hadoop/etc/hadoop/hdfs-site.xml', 'src': 'hdfs-site.xml'})
changed: [192.168.51.6] => (item={'dest': '/usr/local/hadoop/etc/hadoop/hdfs-site.xml', 'src': 'hdfs-site.xml'})
changed: [192.168.51.6] => (item={'dest': '/usr/local/hadoop/etc/hadoop/yarn-site.xml', 'src': 'yarn-site.xml'})
changed: [192.168.51.5] => (item={'dest': '/usr/local/hadoop/etc/hadoop/yarn-site.xml', 'src': 'yarn-site.xml'})
changed: [192.168.51.6] => (item={'dest': '/usr/local/hadoop/etc/hadoop/mapred-site.xml', 'src': 'mapred-site.xml'})
changed: [192.168.51.5] => (item={'dest': '/usr/local/hadoop/etc/hadoop/mapred-site.xml', 'src': 'mapred-site.xml'})
 
TASK: [common | lineinfile dest=/usr/local/hadoop/etc/hadoop/hadoop-env.sh regexp="^export JAVA_HOME" line="export JAVA_HOME=/usr/lib/jvm/java-8-oracle"] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [common | ensure hostkeys is a known host] ******************************
# hadoop-master SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
# hadoop-master SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.5] => (item={'ip': '192.168.51.4', 'hostname': 'hadoop-master'})
changed: [192.168.51.6] => (item={'ip': '192.168.51.4', 'hostname': 'hadoop-master'})
# hadoop-data1 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
# hadoop-data1 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.5] => (item={'ip': '192.168.51.5', 'hostname': 'hadoop-data1'})
# hadoop-data2 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.6] => (item={'ip': '192.168.51.5', 'hostname': 'hadoop-data1'})
# hadoop-data2 SSH-2.0-OpenSSH_6.6.1p1 Ubuntu-2ubuntu2.3
changed: [192.168.51.5] => (item={'ip': '192.168.51.6', 'hostname': 'hadoop-data2'})
changed: [192.168.51.6] => (item={'ip': '192.168.51.6', 'hostname': 'hadoop-data2'})
 
TASK: [oraclejava8 | apt_repository repo='deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main' state=present] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [oraclejava8 | apt_repository repo='deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main' state=present] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [oraclejava8 | debconf name='oracle-java8-installer' question='shared/accepted-oracle-license-v1-1' value='true' vtype='select' unseen=false] ***
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [oraclejava8 | apt_key keyserver=keyserver.ubuntu.com id=EEA14886] ******
changed: [192.168.51.5]
changed: [192.168.51.6]
 
TASK: [oraclejava8 | Install Java] ********************************************
changed: [192.168.51.6]
changed: [192.168.51.5]
 
TASK: [oraclejava8 | lineinfile dest=/home/hadoop/.bashrc regexp="^export JAVA_HOME" line="export JAVA_HOME=/usr/lib/jvm/java-8-oracle"] ***
changed: [192.168.51.6]
changed: [192.168.51.5]
 
PLAY RECAP ********************************************************************
192.168.51.4               : ok=27   changed=26   unreachable=0    failed=0
192.168.51.5               : ok=23   changed=22   unreachable=0    failed=0
192.168.51.6               : ok=23   changed=22   unreachable=0    failed=0

Start Hadoop and run a job

Now that you have Hadoop installed, it’s time to format HDFS and start up all the services. All the commands to do this are available as comments in the bootstrap-master.sh file. The first step is to format the hdfs namenode. All of the commands that follow are executed as the hadoop user.

vagrant@hadoop-master:~/src$ sudo su - hadoop
hadoop@hadoop-master:~$ hdfs namenode -format
15/09/30 16:06:36 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = hadoop-master/192.168.51.4
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.1
STARTUP_MSG:   classpath = [truncated]
STARTUP_MSG:   build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a; compiled by 'jenkins' on 2015-06-29T06:04Z
STARTUP_MSG:   java = 1.8.0_60
************************************************************/
15/09/30 16:06:36 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/09/30 16:06:36 INFO namenode.NameNode: createNameNode [-format]
15/09/30 16:06:36 WARN common.Util: Path /home/hadoop/hadoop-data/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
15/09/30 16:06:36 WARN common.Util: Path /home/hadoop/hadoop-data/hdfs/namenode should be specified as a URI in configuration files. Please update hdfs configuration.
Formatting using clusterid: CID-1c37e2f0-ba4b-4ad7-84d7-223dec53d34a
15/09/30 16:06:36 INFO namenode.FSNamesystem: No KeyProvider found.
15/09/30 16:06:36 INFO namenode.FSNamesystem: fsLock is fair:true
15/09/30 16:06:36 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
15/09/30 16:06:36 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
15/09/30 16:06:36 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
15/09/30 16:06:36 INFO blockmanagement.BlockManager: The block deletion will start around 2015 Sep 30 16:06:36
15/09/30 16:06:36 INFO util.GSet: Computing capacity for map BlocksMap
15/09/30 16:06:36 INFO util.GSet: VM type       = 64-bit
15/09/30 16:06:36 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
15/09/30 16:06:36 INFO util.GSet: capacity      = 2^21 = 2097152 entries
15/09/30 16:06:36 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
15/09/30 16:06:36 INFO blockmanagement.BlockManager: defaultReplication         = 2
15/09/30 16:06:36 INFO blockmanagement.BlockManager: maxReplication             = 512
15/09/30 16:06:36 INFO blockmanagement.BlockManager: minReplication             = 1
15/09/30 16:06:36 INFO blockmanagement.BlockManager: maxReplicationStreams      = 2
15/09/30 16:06:36 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false
15/09/30 16:06:36 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
15/09/30 16:06:36 INFO blockmanagement.BlockManager: encryptDataTransfer        = false
15/09/30 16:06:36 INFO blockmanagement.BlockManager: maxNumBlocksToLog          = 1000
15/09/30 16:06:36 INFO namenode.FSNamesystem: fsOwner             = hadoop (auth:SIMPLE)
15/09/30 16:06:36 INFO namenode.FSNamesystem: supergroup          = supergroup
15/09/30 16:06:36 INFO namenode.FSNamesystem: isPermissionEnabled = true
15/09/30 16:06:36 INFO namenode.FSNamesystem: HA Enabled: false
15/09/30 16:06:36 INFO namenode.FSNamesystem: Append Enabled: true
15/09/30 16:06:37 INFO util.GSet: Computing capacity for map INodeMap
15/09/30 16:06:37 INFO util.GSet: VM type       = 64-bit
15/09/30 16:06:37 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
15/09/30 16:06:37 INFO util.GSet: capacity      = 2^20 = 1048576 entries
15/09/30 16:06:37 INFO namenode.FSDirectory: ACLs enabled? false
15/09/30 16:06:37 INFO namenode.FSDirectory: XAttrs enabled? true
15/09/30 16:06:37 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
15/09/30 16:06:37 INFO namenode.NameNode: Caching file names occuring more than 10 times
15/09/30 16:06:37 INFO util.GSet: Computing capacity for map cachedBlocks
15/09/30 16:06:37 INFO util.GSet: VM type       = 64-bit
15/09/30 16:06:37 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
15/09/30 16:06:37 INFO util.GSet: capacity      = 2^18 = 262144 entries
15/09/30 16:06:37 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
15/09/30 16:06:37 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
15/09/30 16:06:37 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension     = 30000
15/09/30 16:06:37 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
15/09/30 16:06:37 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
15/09/30 16:06:37 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
15/09/30 16:06:37 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
15/09/30 16:06:37 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
15/09/30 16:06:37 INFO util.GSet: Computing capacity for map NameNodeRetryCache
15/09/30 16:06:37 INFO util.GSet: VM type       = 64-bit
15/09/30 16:06:37 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
15/09/30 16:06:37 INFO util.GSet: capacity      = 2^15 = 32768 entries
15/09/30 16:06:37 INFO namenode.FSImage: Allocated new BlockPoolId: BP-992546781-192.168.51.4-1443629197156
15/09/30 16:06:37 INFO common.Storage: Storage directory /home/hadoop/hadoop-data/hdfs/namenode has been successfully formatted.
15/09/30 16:06:37 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/09/30 16:06:37 INFO util.ExitUtil: Exiting with status 0
15/09/30 16:06:37 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hadoop-master/192.168.51.4
************************************************************/

Start DFS

Next start the dfs services, as shown.

hadoop@hadoop-master:~$ /usr/local/hadoop/sbin/start-dfs.sh
Starting namenodes on [hadoop-master]
hadoop-master: Warning: Permanently added the RSA host key for IP address '192.168.51.4' to the list of known hosts.
hadoop-master: starting namenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-namenode-hadoop-master.out
hadoop-data2: Warning: Permanently added the RSA host key for IP address '192.168.51.6' to the list of known hosts.
hadoop-data1: Warning: Permanently added the RSA host key for IP address '192.168.51.5' to the list of known hosts.
hadoop-master: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-hadoop-master.out
hadoop-data2: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-hadoop-data2.out
hadoop-data1: starting datanode, logging to /usr/local/hadoop/logs/hadoop-hadoop-datanode-hadoop-data1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-hadoop-secondarynamenode-hadoop-master.out

At this point you can access the HDFS status and see all three datanodes attached wtih this URL: http://192.168.51.4:50070/dfshealth.html#tab-datanode.

Start yarn

Next start the yarn service as shown.

hadoop@hadoop-master:~$ /usr/local/hadoop/sbin/start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-resourcemanager-hadoop-master.out
hadoop-data2: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-hadoop-data2.out
hadoop-data1: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-hadoop-data1.out
hadoop-master: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-hadoop-nodemanager-hadoop-master.out

At this point you can access information about the compute nodes in the cluster and currently running jobs at this URL: http://192.168.51.4:8088/cluster/nodes

Verify that Java processes are running

Hadoop provides a useful script to run a command on all nodes listed in slaves. For example, you can confirm that all expected Java processes are running as expected with the following command.

hadoop@hadoop-master:~$ $HADOOP_HOME/sbin/slaves.sh jps
hadoop-data2: 3872 DataNode
hadoop-data2: 4180 Jps
hadoop-data2: 4021 NodeManager
hadoop-master: 7617 NameNode
hadoop-data1: 3872 DataNode
hadoop-data1: 4180 Jps
hadoop-master: 8675 Jps
hadoop-data1: 4021 NodeManager
hadoop-master: 8309 NodeManager
hadoop-master: 8150 ResourceManager
hadoop-master: 7993 SecondaryNameNode
hadoop-master: 7788 DataNode

Run an example job

Finally, it’s possible to confirm that everything is working as expected by running one of the example jobs. Let’s find the number pi.

hadoop@hadoop-master:~$ hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 10 30
Number of Maps  = 10
Samples per Map = 30
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
15/09/30 19:54:28 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/192.168.51.4:8032
15/09/30 19:54:29 INFO input.FileInputFormat: Total input paths to process : 10
15/09/30 19:54:29 INFO mapreduce.JobSubmitter: number of splits:10
15/09/30 19:54:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1443642855962_0001
15/09/30 19:54:29 INFO impl.YarnClientImpl: Submitted application application_1443642855962_0001
15/09/30 19:54:29 INFO mapreduce.Job: The url to track the job: http://hadoop-master:8088/proxy/application_1443642855962_0001/
15/09/30 19:54:29 INFO mapreduce.Job: Running job: job_1443642855962_0001
15/09/30 19:54:38 INFO mapreduce.Job: Job job_1443642855962_0001 running in uber mode : false
15/09/30 19:54:38 INFO mapreduce.Job:  map 0% reduce 0%
15/09/30 19:54:52 INFO mapreduce.Job:  map 40% reduce 0%
15/09/30 19:54:56 INFO mapreduce.Job:  map 100% reduce 0%
15/09/30 19:54:59 INFO mapreduce.Job:  map 100% reduce 100%
15/09/30 19:54:59 INFO mapreduce.Job: Job job_1443642855962_0001 completed successfully
15/09/30 19:54:59 INFO mapreduce.Job: Counters: 49
        File System Counters
                FILE: Number of bytes read=226
                FILE: Number of bytes written=1272744
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=2710
                HDFS: Number of bytes written=215
                HDFS: Number of read operations=43
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=3
        Job Counters
                Launched map tasks=10
                Launched reduce tasks=1
                Data-local map tasks=10
                Total time spent by all maps in occupied slots (ms)=140318
                Total time spent by all reduces in occupied slots (ms)=4742
                Total time spent by all map tasks (ms)=140318
                Total time spent by all reduce tasks (ms)=4742
                Total vcore-seconds taken by all map tasks=140318
                Total vcore-seconds taken by all reduce tasks=4742
                Total megabyte-seconds taken by all map tasks=143685632
                Total megabyte-seconds taken by all reduce tasks=4855808
        Map-Reduce Framework
                Map input records=10
                Map output records=20
                Map output bytes=180
                Map output materialized bytes=280
                Input split bytes=1530
                Combine input records=0
                Combine output records=0
                Reduce input groups=2
                Reduce shuffle bytes=280
                Reduce input records=20
                Reduce output records=0
                Spilled Records=40
                Shuffled Maps =10
                Failed Shuffles=0
                Merged Map outputs=10
                GC time elapsed (ms)=3509
                CPU time spent (ms)=5620
                Physical memory (bytes) snapshot=2688745472
                Virtual memory (bytes) snapshot=20847497216
                Total committed heap usage (bytes)=2040528896
        Shuffle Errors
                BAD_ID=0
                CONNECTION=0
                IO_ERROR=0
                WRONG_LENGTH=0
                WRONG_MAP=0
                WRONG_REDUCE=0
        File Input Format Counters
                Bytes Read=1180
        File Output Format Counters
                Bytes Written=97
Job Finished in 31.245 seconds
Estimated value of Pi is 3.16000000000000000000

Security and Configuration

This example is not production hardened. It does nothing to address firewall management. The key management is permissive and intended to make it easy to communicate between nodes. If this is to be used for a production deployment, it should be easy to add a role to setup the firewall. You may also want be more cautious about accepting keys between hosts.

Default Ports

Lots of people ask about what the default ports are for Hadoop services. The following four links provide all the properties that can be set for any of the main components, including the defaults if they are absent from the configuration file. If it isn’t overridden in the Ansible playbook role templates in the git repository, then the property is the default as shown in the links below.

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-common/yarn-default.xml
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml

Problems spanning subnets

While developing this automation, I originally had the datanodes running on a separate subnet. There’s a problem/bug with Hadoop that prevented nodes from communicating across subnets. The following thread covers some of the discussion.

http://mail-archives.apache.org/mod_mbox/hadoop-user/201509.mbox/%3CCAKFXasEROCe%2BfL%2B8T7A3L0j4Qrm%3D4HHuzGfJhNuZ5MqUvQ%3DwjA%40mail.gmail.com%3E

Resources

While developing my Ansible scripts I leaned heavily on this tutorial
https://chawlasumit.wordpress.com/2015/03/09/install-a-multi-node-hadoop-cluster-on-ubuntu-14-04/

Software Engineering

Explore CloudFoundry using bosh-lite on Windows

It seems like most of the development around CloudFoundry and bosh happen on Linux or Mac. Getting things up and running in Windows was a real challenge. Below is how I worked things out.

**Make sure you have a modern processor that supports all virtualization technologies, such as VTx and extended paging.

Aside from the deviations mentioned below, I’m following the steps documented at https://github.com/cloudfoundry/bosh-lite

Changes to Vagrantfile

I’m using VirtualBox on Windows 7. To begin with, I modified the Vagrantfile to create two VMs rather than a single VM. The first is the VM that will run CloudFoundry. The second is to run bosh for the deployment of CloudFoundry. I use a second Linux VM to execute the bosh deployment since all the commands and files were developed in a *nix environment.

I am also more explicit in my network setup. I want the two hosts to have free communication on a local private network. I leave the default IP address assignment for the CloudFoundry host. For the bosh host I change the last octet of the IP address to 14.

  config.vm.define "cf" do |cf|
    cf.vm.provider :virtualbox do |v, override|
      override.vm.box = 'cloudfoundry/bosh-lite'
      override.vm.box_version = '388'
 
      # To use a different IP address for the bosh-lite director, uncomment this line:
      override.vm.network :private_network, ip: '192.168.50.4', id: :local
      override.vm.network :public_network
    end
  end
 
  config.vm.define "boshlite" do |boshlite|
    boshlite.vm.provider :virtualbox do |v, override|
      override.vm.box = 'ubuntu/trusty64'
 
      # To use a different IP address for the bosh-lite director, uncomment this line:
      override.vm.network :private_network, ip: '192.168.50.14', id: :local
      override.vm.network :public_network
      v.memory = 6144
      v.cpus = 2
    end
  end

At this point you can spin up the two hosts.

vagrant up --provider=virtualbox

The remaining steps need to happen on your bosh deployment host (192.168.0.14 based on the Vagrantfile above). In case you need it, here is a refresher on setting up Vagrant SSH connectivity using PuTTY on Windows.

Prepare for provision_cf

If you are in a proxied environment, you’ll need to set the environment variables, including no_proxy for the CloudFoundry host. I include xip.io for ease of access in future steps.

export http_proxy=http://proxy.domain.com:8080
export https_proxy=https://proxy.domain.com:8080
export no_proxy=192.168.50.4,xip.io

Next we need to get prerequisites going and then install the bosh CLI. You may have some of these already, and you may need some additional libraries. This is based on a clean Ubuntu trusty 64 box.

sudo -E add-apt-repository multiverse
sudo -E apt-get update
sudo -E apt-get -y install build-essential linux-headers-`uname -r`
sudo -E apt-get -y install ruby ruby-dev git zip

Now bosh_cli can be installed. I’ve added flags to skip ‘ri’ and ‘rdoc’ since they take a long time. If you really want those, you can drop those arguments.

sudo -E gem install bosh_cli --no-ri --no-rdoc

We also need spiff on this system. Here I grab and unzip the latest spiff, then move the binary into /usr/local/bin.

wget https://github.com/cloudfoundry-incubator/spiff/releases/download/v1.0.3/spiff_linux_amd64.zip
unzip spiff_linux_amd64.zip
sudo mv spiff /usr/local/bin/

Next we need to clone both bosh-lite and cf-release. Even though the contents of bosh-lite are available in “/vagrant”, we need these two directories side by side, so it’s easiest to just clone them both into the home directory of the bosh deployment host. We then change into the bosh-lite directory.

git clone https://github.com/cloudfoundry/bosh-lite.git
git clone https://github.com/cloudfoundry/cf-release
cd bosh-lite/

The script ./bin/provision_cf needs to be edited so that get_ip_from_vagrant_ssh_config simply outputs the private network IP address that was assigned in the Vagrant file. The default functionality assumes that the provision script is run from the the host running Vagrant and VirtualBox. However, these commands are running on the bosh deployment host, which doesn’t know anything about vagrant or virtualbox. Here’s what the function should look like.

get_ip_from_vagrant_ssh_config() {
  echo 192.168.50.4
}

Target bosh and provision

Everything is set to target the bosh host, set the route and provision CloudFoundry. When you first target the cloudfoundry host, it will as for credentials to login.

vagrant@vagrant-ubuntu-trusty-64:~/bosh-lite$ bosh target 192.168.50.4 lite
Target set to `Bosh Lite Director'
Your username: admin
Enter password: *****
Logged in as `admin'

Next we can add the route to the bosh deployment host.

vagrant@vagrant-ubuntu-trusty-64:~/bosh-lite$ ./bin/add-route
Adding the following route entry to your local route table to enable direct warden container access. Your sudo password may be required.
  - net 10.244.0.0/19 via 192.168.50.4

Provision CloudFoundry

The only thing left to do is provision CloudFoundry.

./bin/provision_cf
...
Started         2014-09-29 18:54:39 UTC
Finished        2014-09-29 19:36:11 UTC
Duration        00:41:32
 
Deployed `cf-manifest.yml' to `Bosh Lite Director'

This takes quite a while (possibly hours depending on your hardware). If you have an older processor that doesn’t support all the modern virtualization technologies, this could take much longer.

Verify your new CloudFoundry deployment

In order to use CloudFoundry we need the ‘cf’ client. The cf client is available as a binary download from the main GitHub page for CloudFoundry. The following commands will prepare the cf CLI for use.

wget http://go-cli.s3-website-us-east-1.amazonaws.com/releases/v6.6.1/cf-linux-amd64.tgz
tar xzvf cf-linux-amd64.tgz
sudo mv cf /usr/local/bin/

With the cf CLI installed, it is now possible connect to the API and setup org and space details.

cf api --skip-ssl-validation https://api.10.244.0.34.xip.io
cf auth admin admin
cf create-org myorg
cf target -o myorg
cf create-space mydept
cf target -o myorg -s mydept

You should now have an environment that matches the below.

API endpoint:   https://api.10.244.0.34.xip.io (API version: 2.14.0)
User:           admin
Org:            myorg
Space:          mydept

Deploy an app

You can now deploy an application. To verify, create a directory can add a file:

index.php

<?php phpinfo(); ?>

Now push that app as follows:

vagrant@vagrant-ubuntu-trusty-64:~/test-php$ cf push test-php
Creating app test-php in org myorg / space mydept as admin...
OK
 
Creating route test-php.10.244.0.34.xip.io...
OK
 
Binding test-php.10.244.0.34.xip.io to test-php...
OK
 
Uploading test-php...
Uploading app files from: /home/vagrant/test-php
Uploading 152, 1 files
OK
 
Starting app test-php in org myorg / space mydept as admin...
OK
-----> Downloaded app package (4.0K)
Use locally cached dependencies where possible
 !     WARNING:        No composer.json found.
       Using index.php to declare PHP applications is considered legacy
       functionality and may lead to unexpected behavior.
       See https://devcenter.heroku.com/categories/php
-----> Setting up runtime environment...
       - PHP 5.5.12
       - Apache 2.4.9
       - Nginx 1.4.6
-----> Installing PHP extensions:
       - opcache (automatic; bundled, using 'ext-opcache.ini')
-----> Installing dependencies...
       Composer version ac497feabaa0d247c441178b7b4aaa4c61b07399 2014-06-10 14:13:12
       Warning: This development build of composer is over 30 days old. It is recommended to update it by running "/app/.heroku/php/bin/composer self-update" to get the latest version.
       Loading composer repositories with package information
       Installing dependencies
       Nothing to install or update
       Generating optimized autoload files
-----> Building runtime environment...
       NOTICE: No Procfile, defaulting to 'web: vendor/bin/heroku-php-apache2'
-----> Uploading droplet (64M)
 
0 of 1 instances running, 1 starting
1 of 1 instances running
 
App started
 
Showing health and status for app test-php in org myorg / space mydept as admin...
OK
 
requested state: started
instances: 1/1
usage: 256M x 1 instances
urls: test-php.10.244.0.34.xip.io
 
     state     since                    cpu    memory          disk
#0   running   2014-09-29 07:52:38 PM   0.0%   84.9M of 256M   0 of 1G

It’s now possible to view the app using a browser. From the command line you can access it using this command:

w3m http://test-php.10.244.0.34.xip.io

Observations

In my tests, the xip.io resolution was flaky. I saw intermittent failures with the response:

dial tcp: lookup api.10.244.0.34.xip.io: no such host

In some cases I would have to run the same command a few times before it could resolve the host.

The VMs I setup obtained IP addresses on my network. However, when I tried to access apps or the API over that IP address, the connection is refused. Despite adding the domain (e.g. dhcpip.xip.io) to CloudFoundry and creating routes to my application, all attempts to use the API or load apps over the external IP failed.

Software Engineering

Explore CloudFoundry using Stackato and VirtualBox

Stackato, which is released by ActiveState, extends out of the box CloudFoundry. It adds a web interface and a command line client (‘stackato’), although the existing ‘cf’ command line client still works (as long as versions match up). Stackato includes some autoscale features and a very well done set of documentation.

ActiveState publishes various VM images that can be used to quickly spin up a development environment. These include images for VMWare, KVM and VirtualBox, among others. In this post I’ll walk through getting a Stackato environment running on Windows using VirtualBox.

Install and Configure Stackato

The obvious first step is to download and install VirtualBox. The steps shown below should work on any system that runs VirtualBox.

I follow these steps to install and configure the Stackato VM.
http://docs.stackato.com/admin/setup/microcloud.html

After downloading the VirtualBox images for Stackato, from VirtualBox I click “File->Import Appliance” and navigating to the unzipped contents of the VM download. I select the OVF file and click open.

import-stackato-vm-virtualbox

This process can take several minutes depending on your system.

import-stackato-vm-virtualbox-progress

After clicking next, you can configure various settings. Click the checkbox to Reinitialize the MAC address on all network cards.

import-stackato-vm-virtualbox-reinitialize-mac

Once the import completes, right click on the VM in VirtualBox and choose settings.

stackato-vm-virtualbox-settings

Navigate to the network settings and make sure it’s set to Bridged. The default is NAT. Bridged will allow the VM to obtain it’s own IP address on the network. This will facilitate access later on.

stackato-vm-virtualbox-settings-network

Depending on your system resources, you may also want to go into the System settings and increase the Base Memory and number of Processors. You can also ensure that all virtualization accelerators are enabled.

Launch the Stackato VM

After you click OK, the VM is ready to launch. When you launch the VM, there is an initial message asking if you want to boot into recovery mode. Choose regular boot and you will then see a message about the initial setup.

launch-stackato-virtualbox-initial-setup

Eventually the screen will show system status. You’ll notice that it bounces around until it finally settles on a green READY status.

launch-stackato-virtualbox-ready

At this point you have a running instance of Stackato.

Configure Stackato

You can see in the above screenshot that Stackato displays a URL for the management console and the IP address of the system. In order to complete the configuration of the system (and before you can SSH into the server), you need to access the web console using that URL. This may require that you edit your local hosts file (/etc/hosts on Linux or C:\Windows\System32\drivers\etc\hosts on Windows). The entries in your hosts file should look something like this:

16.85.146.131	stackato-7kgb.local
16.85.146.131	api.stackato-7kgb.local

You can now access the console using https://api.stackato-7kgb.local. Don’t forget to put “http://” in front so the browser knows to request that URL rather than search for it. The server is also using a self-signed certificate, so you can expect your browser to complain. It’s OK to tell your browser to load the page despite the self-signed certificate.

On the page that loads, you need to provide a username, email address, and some other details to configure this Stakcato installation. Provide the requested details and click to setup the initial admin user.

stackato-cloudfoundry-configuration

A couple of things just happened. First off, a user was created with the username you provide. The password you chose will also become the password for the system user ‘stackato‘. This is important because it allows you to SSH into your instance.

Wildcard DNS

Widlcard DNS, using a service like xip.io, will make it easier to access Stackato and any applications that you deploy there. First we log in to our VM over SSH and we use the node rename command to enable wildcard DNS.

kato node rename 16.85.146.131.xip.io

Stopping and starting related roles takes a few minutes after running the command above. Once the server returns to READY state, the console is available using the xip.io address, https://api.16.85.146.131.xip.io. This also applies to any applications that are deployed.

The entries in the local hosts file are no longer needed and can be removed.

Proxy Configuration

Many enterprise environments route internet access through a proxy. If this is the case for you, it’s possible to identify the upstream proxy for all Stackato related services. Run the following commands on the Stackato VM to enable proxy access.

kato op upstream_proxy set proxy.domain.com:8080
sudo /etc/init.d/polipo restart

It may also be necessary to set the http_proxy and https_proxy environment variables by way of the .bashrc for the stackato system user.

At this point you should be able to easily deploy a new app from the app store using the Stackato console. Let’s turn our attention now to using a client to deploy a simple app.

Use the stackato CLI to Deploy an App

The same virtual host that is running Stackato also includes the command line client. That meas it can be used to deploy a simple application and verify that Stackato is working properly. To do this, first connect to the VM using SSH. Once connected, the following steps will prepare the command line client to deploy a simple application.

  1. set the target
  2. login/authenticate
  3. push the app

To set the target and login, we use these commands

stackato target api.16.85.146.131.xip.io
stackato login watrous

The output of the commands can be seen in the image below:

stackato-cli-target-login

Example Python App

There is a simple python bottle application we can use to confirm our Stackato deployment. To deploy we first clone, then use the stackato client to push, as shown below.

stackato-cli-deploy-push-python

Here are those commands in plain text:

git clone https://github.com/Stackato-Apps/bottle-py3.git
cd bottle-py3/
stackato push -n

Using the URL provided in the output from ‘stackato push’ we can view the new app.

stackato-python-bottle-app

You can now scale up instances and manage other aspects of the app using the web console or the stackato client.

Software Engineering

Using Vagrant to build a LEMP stack

I may have just fallen in love with the tool Vagrant. Vagrant makes it possible to quickly create a virtual environment for development. It is different than cloning or snapshots in that it uses minimal base OSes and provides a provisioning mechanism to setup and configure the environment exactly the way you want for development. I love this for a few reasons:

  • All developers work in the exact same environment
  • Developers can get a new environment up in minutes
  • Developers don’t need to be experts at setting up the environment.
  • System details can be versioned and stored alongside code

This short tutorial below demonstrates how easy it is to build a LEMP stack using Vagrant.

Install VirtualBox

Vagrant is not a virtualization tool. Instead vagrant will leverage an existing provider of virtual compute resources, either local or remote. For example, Vagrant can be used to create a virtual environment on Amazon Web Services or locally using a tool like VirtualBox. For this tutorial, we’ll use VirtualBox. You can download and install VirtualBox from the official website.

https://www.virtualbox.org/

Install Vagrant

Next, we install Vagrant. Downloads are freely available on their website.

http://www.vagrantup.com/

For the remainder of this tutorial, I’m going to assume that you’ve been through the getting started training and are somewhat familiar with Vagrant.

Accommodate SSH Keys

UPDATE 6/26/2015: Vagrant introduced the unfortunate feature of producing a random key for each new VM as the default behavior. It’s possible to restore the original functionality (described below) and use the insecure key with the config.ssh.insert_key = false setting in a Vagrantfile.

Until (if ever) Vagrant defaults to using the insecure key, a system wide work around is to add a Vagrantfile to the local .vagrant.d folder which will add set this setting for all VMs (see Load Order and Merging), unless otherwise overridden. The Vagrant file can be as simple as this:

# -*- mode: ruby -*-
# vi: set ft=ruby :
 
VAGRANTFILE_API_VERSION = "2"
 
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.ssh.insert_key = false
 
end

Vagrant creates an SSH key which it installs on guest hosts by default. This can be a huge time saver since it prevents the need for passwords. Since I use PuTTY on windows, I needed to convert the SSH key and save a PuTTY session to accommodate connections. Use PuTTYgen to do this.

  1. Open PuTTYgen
  2. Click “Load”
  3. Navigate to the file C:\Users\watrous\.vagrant.d\insecure_private_key

PuTTYgen shows a dialog saying that the import was successful and displays the details of the key, as shown here:

import-vagrant-ssh-key-puttygen

Click “Save private key”. You will be prompted about saving the key without a passphrase, which in this case is fine, since it’s just for local development. If you end up using Vagrant to create public instances, such as using Amazon Web Services, you should use a more secure connection method. Give the key a unique name, like C:\Users\watrous\.vagrant.d\insecure_private_key-putty.ppk and save.

Finally, create a saved PuTTY session to connect to new Vagrant instances. Here are some of my PuTTY settings:

putty-session-vagrant-settings-1

putty-session-vagrant-settings-auth

The username may change if you choose a different base OS image from the vagrant cloud, but the settings shown above should work fine for this tutorial.

Get Ready to ‘vagrant up’

Create a directory where you can store the files Vagrant needs to spin up your environment. I’ll refer to this directory as VAGRANT_ENV.

To build a LEMP stack we need a few things. First is a Vagrantfile file where we identify the base OS, or box, ports, etc. This is a text file that follows Ruby language conventions. Create the file VAGRANT_ENV/Vagrantfile with the following contents:

# -*- mode: ruby -*-
# vi: set ft=ruby :
 
# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"
 
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  # All Vagrant configuration is done here. The most common configuration
  # options are documented and commented below. For a complete reference,
  # please see the online documentation at vagrantup.com.
 
  # Every Vagrant virtual environment requires a box to build off of.
  config.vm.box = "ubuntu/trusty64"
  config.vm.provision :shell, path: "bootstrap.sh"
  config.vm.network :forwarded_port, host: 4567, guest: 80
  config.ssh.shell = "bash -c 'BASH_ENV=/etc/profile exec bash'"
end

This file chooses a 64 bit trusty version of Ubuntu, forwards port 4567 on the host machine to port 80 on the guest machine and identifies a bootstrap shell script, which I show next.

Create VAGRANT_ENV/bootstrap.sh with the following contents:

#!/usr/bin/env bash
 
#accommodate proxy environments
#export http_proxy=http://proxy.company.com:8080
#export https_proxy=https://proxy.company.com:8080
apt-get -y update
apt-get -y install nginx
debconf-set-selections <<< 'mysql-server mysql-server/root_password password secret'
debconf-set-selections <<< 'mysql-server mysql-server/root_password_again password secret'
apt-get -y install mysql-server
#mysql_install_db
#mysql_secure_installation
apt-get -y install php5-fpm php5-mysql
sed -i s/\;cgi\.fix_pathinfo\s*\=\s*1/cgi.fix_pathinfo\=0/ /etc/php5/fpm/php.ini
service php5-fpm restart
mv /etc/nginx/sites-available/default /etc/nginx/sites-available/default.bak
cp /vagrant/default /etc/nginx/sites-available/default
service nginx restart
echo "<?php phpinfo(); ?>" > /usr/share/nginx/html/info.php

This script executes a sequence of commands from the shell as root after provisioning the new server. This script must run without requiring user input. It also should accommodate any configuration changes and restarts necessary to get your environment ready to use.

More sophisticated tools like Ansible, Chef and Puppet can also be used.

you may have noticed that the above script expects a modified version of nginx’s default configuration. Create the file VAGRANT_ENV/default with the following contents:

server {
	listen 80 default_server;
	listen [::]:80 default_server ipv6only=on;
 
	root /usr/share/nginx/html;
	index index.php index.html index.htm;
 
	server_name localhost;
 
	location / {
		try_files $uri $uri/ =404;
	}
 
	error_page 404 /404.html;
 
	error_page 500 502 503 504 /50x.html;
	location = /50x.html {
		root /usr/share/nginx/html;
	}
 
	location ~ \.php$ {
		fastcgi_split_path_info ^(.+\.php)(/.+)$;
		fastcgi_pass unix:/var/run/php5-fpm.sock;
		fastcgi_index index.php;
		include fastcgi_params;
	}
}

vagrant up

Now it’s time to run ‘vagrant up‘. To do this, open a console window and navigate to your VAGRANT_ENV directory, then run ‘vagrant up’.

vagrant-up-console

If this is the first time you have run ‘vagrant up’, it may take a few minutes to download the ‘box’. Once it’s done, you should be ready to visit your PHP page rendered by nginx on a local virtual machine created and configured by Vagrant:

http://127.0.0.1:4567/info.php