Today: January 15, 2025 4:20 am
A collection of Software and Cloud patterns with a focus on the Enterprise

Tag: mapreduce


In a previous post I demonstrated a method to deploy a multi-node Hadoop cluster using Vagrant and Ansible. This post builds on that and shows how to deploy a Hadoop cluster with an arbitrary number of slave nodes in minutes on OpenStack. This process makes use of the OpenStack orchestration layer HEAT to provision the resources, after which Ansible use used to configure those resources. All the scripts to do this yourself is available on github to clone and fork: https://github.com/dwatrous/hadoop-multi-server-ansible I have recorded a video demonstrating the entire process, including scaling the......

Continue Reading


I’ve recently been involved with several groups interested in using Hadoop to process large sets of data, including use of higher level abstractions on top of Hadoop like Pig and Hive. What has surprised me most is that no one is automating their installation of Hadoop. In each case that I’ve observed they start by manually provisioning some servers and then follow a series of tutorials to manually install and configure a cluster. The typical experience seems to take about a week to setup a cluster. There is often a lot of wasted......

Continue Reading