Today: January 15, 2025 3:48 am
A collection of Software and Cloud patterns with a focus on the Enterprise

Tag: hdfs


In a previous post I demonstrated a method to deploy a multi-node Hadoop cluster using Vagrant and Ansible. This post builds on that and shows how to deploy a Hadoop cluster with an arbitrary number of slave nodes in minutes on OpenStack. This process makes use of the OpenStack orchestration layer HEAT to provision the resources, after which Ansible use used to configure those resources. All the scripts to do this yourself is available on github to clone and fork: https://github.com/dwatrous/hadoop-multi-server-ansible I have recorded a video demonstrating the entire process, including scaling the......

Continue Reading


I’ve recently been involved with several groups interested in using Hadoop to process large sets of data, including use of higher level abstractions on top of Hadoop like Pig and Hive. What has surprised me most is that no one is automating their installation of Hadoop. In each case that I’ve observed they start by manually provisioning some servers and then follow a series of tutorials to manually install and configure a cluster. The typical experience seems to take about a week to setup a cluster. There is often a lot of wasted......

Continue Reading


My previous hadoop example operated against the local filesystem, in spite of the fact that I formatted a local HDFS partition. In order to operate against the local HDFS partition it’s necessary to first start the namenode and datanode. I mostly followed these instructions to start those processes. Here’s the most relevant part that I hadn’t done yet. # Format the namenode hdfs namenode -format # Start the namenode hdfs namenode # Start a datanode hdfs datanode# Format the namenode hdfs namenode -format # Start the namenode hdfs namenode # Start a datanode......

Continue Reading