hdfs – Daniel Watrous on Software and Cloud Engineering

In a previous post I demonstrated a method to deploy a multi-node Hadoop cluster using Vagrant and Ansible. This post builds on that and shows how to deploy a Hadoop cluster with an arbitrary number of slave nodes in minutes on OpenStack. This process makes use of the OpenStack orchestration layer HEAT to provision the resources, after which Ansible use used to configure those resources. All the scripts to do this yourself is available on github to clone and fork: https://github.com/dwatrous/hadoop-multi-server-ansible I have recorded a video demonstrating the entire process, including scaling the......

Continue Reading

I’ve recently been involved with several groups interested in using Hadoop to process large sets of data, including use of higher level abstractions on top of Hadoop like Pig and Hive. What has surprised me most is that no one is automating their installation of Hadoop. In each case that I’ve observed they start by manually provisioning some servers and then follow a series of tutorials to manually install and configure a cluster. The typical experience seems to take about a week to setup a cluster. There is often a lot of wasted......

Continue Reading

My previous hadoop example operated against the local filesystem, in spite of the fact that I formatted a local HDFS partition. In order to operate against the local HDFS partition it’s necessary to first start the namenode and datanode. I mostly followed these instructions to start those processes. Here’s the most relevant part that I hadn’t done yet. # Format the namenode hdfs namenode -format # Start the namenode hdfs namenode # Start a datanode hdfs datanode# Format the namenode hdfs namenode -format # Start the namenode hdfs namenode # Start a datanode......

Continue Reading

Tag: hdfs

October 22, 2015

Software Engineering

2 Comments

Bulid a multi-server Hadoop cluster in OpenStack in minutes

Daniel Watrous

October 1, 2015

Software Engineering

20 Comments

Install and configure a Multi-node Hadoop cluster using Ansible

Daniel Watrous

November 15, 2013

Software Engineering

No Comments

Hadoop HDFS in Standalone Mode

Daniel Watrous