Bulid a multi-server Hadoop cluster in OpenStack in minutes

In a previous post I demonstrated a method to deploy a multi-node Hadoop cluster using Vagrant and Ansible. This post builds on that and shows how to deploy a Hadoop cluster with an arbitrary number of slave nodes in minutes on OpenStack. This process makes use of the OpenStack orchestration layer HEAT to provision the resources, after which Ansible use used to configure those resources. All the scripts to do this yourself is available on github to clone and fork: https://github.com/dwatrous/hadoop-multi-server-ansible I have recorded a video demonstrating the entire process, including scaling the cluster after initial deployment: Scope The scope of this article is to create a Hadoop cluster with an arbitrary number of slave nodes, which can be automatically scaled up or down to accommodate changes in capacity as workloads change. The following diagram illustrates this: Build the servers For convenience, this process still uses Vagrant to create a server that will function as the heat and ansible controller. It’s also possible create a server in OpenStack to fill this role. In this case you could simply use the bootstrap-master.sh script to configure that server. The steps to create the servers in OpenStack using heat are: Install openstack clients (we do this in a python virtual environment) Download and source the openrc file from your OpenStack environment Use the openstack clients to get details about keypairs, images, networks, etc. Update the heat template for your environment Use heat to build your servers Install and Run Hadoop Once the servers are provisioned, it’s time to install Hadoop. This is done using Ansible and can be run from the same host where heat was used (the vagrant created server in this case). Ansible requires an inventory file to run. Since heat is aware of the server resources it created, I added … Continue reading Bulid a multi-server Hadoop cluster in OpenStack in minutes