Daniel Watrous on Software Engineering

A Collection of Software Problems and Solutions

Posts tagged command line

Software Engineering

Install Stackato (CloudFoundry) on HPCloud

I recently published an article to get CloudFoundry running by way of Stackato on a local machine using VirtualBox. As soon as you have something ready to share with the world (testers, executives, investors, etc.), you’ll want something more public. Fortunately, it’s easy to run Stackato on HPCloud.com. I’m following the steps outlined here: https://docs.stackato.com/admin/server/hpcs.html.

Configuration of HPCloud.com

For the security groups, I created two separate groups, one for SSH and another for web. I do this to allow for separation of access and web service functions in the future (using a Bastian host for example).


Launch an Instance

My settings for the instance are straight forward and shown below. I’m using a medium size instance and a I check both the web and remote security groups that I created above. If you haven’t already added a key pair for secure remote access, do that before creating the instance. If you forget to do this, the default password for the stackato user is ‘stackato’, at least until you complete the setup of the first admin user.


And the security group assignments.


I follow the instructions as provided to associate a floating IP address with the new instance and create a DNS entry for easy access to my new stackato installation. In my case I didn’t move DNS management to hpcloud.com. Instead I just added A and CNAME records where it was already being managed.

Role and name management

When I installed Stackato on VirtualBox previously, I had to rename the node to use wildcard DNS. In this case I need to rename it to use the domain name for which I created the A and CNAME records above.

For my scenario I ran the following two commands:

kato role remove mdns
kato node rename stackato.danielwatrous.com

You may see some errors about it being unable to resolve the default node name. You can safely ignore these. Restarting all the affected roles can take a few minutes.

Stackato uses HTTPS by default. However, if you are using the default certificate, you may have to bypass some security warnings in your browser.

Pushing code

You should be all set to push an app to the newly deployed Stackato instance. Let’s push a simple PHP app. Two files are required to push this using the stackato command line client. First is the PHP file, index.php. Next is the manifest yaml file. The manifest is required so that CloudFoundry can identify the correct buildpack to prepare the instance.



And a very basic manifest.yml file.

- name: test-php
  framework: php

The console session to deploy this simple app is shown below:

F:\cf-php-test>stackato target stackato.danielwatrous.com
Host redirects to: 'https://api.stackato.danielwatrous.com'
Successfully targeted to [https://api.stackato.danielwatrous.com]
Target:       https://api.stackato.danielwatrous.com
Organization: <none>
Space:        <none>
F:\cf-php-test>stackato login watrous
Attempting login to [https://api.stackato.danielwatrous.com]
Password: ********
Successfully logged into [https://api.stackato.danielwatrous.com]
Choosing the one available organization: "DanielWatrous.com"
Choosing the one available space: "Explore"
Target:       https://api.stackato.danielwatrous.com
Organization: DanielWatrous.com
Space:        Explore
No license installed.
Using 4G of 4G.
F:\cf-php-test>stackato push
Would you like to deploy from the current directory ?  [Yn]:
Using manifest file "manifest.yml"
Application Deployed URL [test-php.stackato.danielwatrous.com]:
Application Url:   https://test-php.stackato.danielwatrous.com
Enter Memory Reservation [256]: 20
Enter Disk Reservation [2048]: 1024
Creating Application [test-php] as [https://api.stackato.danielwatrous.com -> DanielWatrous.com -> Explore -> test-php] ... OK
  Map https://test-php.stackato.danielwatrous.com ... OK
Create services to bind to 'test-php' ?  [yN]:
Uploading Application [test-php] ...
  Checking for bad links ...  OK
  Copying to temp space ...  OK
  Checking for available resources ...  OK
  Processing resources ... OK
  Packing application ... OK
  Uploading (231) ...  OK
Push Status: OK
Starting Application [test-php] ...
stackato[dea_ng]: Staging application
stackato[fence]: Created Docker container
stackato[fence]: Prepared Docker container
stackato[cloud_controller_ng]: Updated app 'test-php' -- {"console"=>true, "state"=>"STARTED"}
staging: -----> Downloaded app package (4.0M)
staging: ****************************************************************************
staging: * Using the legacy buildpack to stage a 'php' framework application.
staging: *
staging: * Note that the legacy buildpack is a migration tool to provide backwards
staging: * compatibility while moving from Stackato 2.x to Stackato 3.0.  It is not
staging: * updated with new features beyond what Stackato 2.10.6 supplied.
staging: *
staging: * Please use a non-legacy buildpack for any new code developed for Stackato!
staging: ****************************************************************************
staging: end of staging
staging: -----> Uploading droplet (4.0M)
stackato[dea_ng]: Uploading droplet
stackato[dea_ng]: Completed uploading droplet
stackato[fence]: Destroyed Docker container
stackato[fence.0]: Created Docker container
stackato[fence.0]: Prepared Docker container
stackato[dea_ng.0]: Launching web process: /home/stackato/startup
app[stderr.0]: AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using Set the 'ServerName' directive globally to suppress this message
stackato[dea_ng.0]: Instance is ready
http://test-php.stackato.danielwatrous.com/ deployed

At which point I can load my PHP test app, which is just phpinfo().


Software Engineering

OpenStack REST API

There are some high quality resources that already cover the OpenStack API, so this is a YEA (yet another example) post. See the resources section below for some helpful links.

OpenStack APIs provide access to all OpenStack components, such as nova (compute), glance (VM images), swift (object storage), cinder (block storage), keystone (authentication) and neutron (networking). Authentication tokens are valid for a fixed duration, after which they expire and must be replaced. Each service requires it’s own token. Services that are hosted on the same logical server are typically accessible over different ports.

OpenStack APIs are RESTful, which means there are many ways to use them. In this post I’ll demonstrate three approaches that should provide clarity into their structure.

  • Command Line Interface (CLI)
  • cURL
  • REST Client

In this post I don’t cover programming against the REST APIs, but instead focus just on how they work. This work builds on my OpenStack development post.

Command Line Interface (CLI)

Command Line Interfaces used to manage OpenStack components make use of the REST APIs behind the scenes: a rather smart design choice on the part of the OpenStack community. This brings consistency to OpenStack management efforts and discourages disparity between standard tooling (CLI) and custom tooling (direct API access).

Credentials are required to access the REST APIs. For the command line client, these credentials are stored as environment variables. If you’re using DevStack, you can use the openrc script to automatically setup your environment.

source openrc admin admin

Some clients support a debug option that will output full details about the request and response cycle. Raw request and response details can be helpful when learning the APIs or creating programmatic access libraries that wrap the APIs. Here’s an example that will list the flavors available.

$ nova --debug flavor-list
REQ: curl -i 'http://openstack.danielwatrous.com:5000/v2.0/tokens' -X POST -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: python-novaclient" -d '{"auth": {"tenantName": "admin", "passwordCredentials": {"username": "admin", "password": "{SHA1}95397c42a173838417806ce19d78f133ae6baa24"}}}'
INFO (connectionpool:258) Starting new HTTP connection (1): proxy.company.com
DEBUG (connectionpool:375) Setting read timeout to 600.0
DEBUG (connectionpool:415) "POST http://openstack.danielwatrous.com:5000/v2.0/tokens HTTP/1.1" 200 6823
RESP: [200] CaseInsensitiveDict({'content-length': '6823', 'proxy-connection': 'Keep-Alive', 'vary': 'X-Auth-Token', 'server': 'Apache/2.4.7 (Ubuntu)', 'connection': 'Keep-Alive', 'date': 'Thu, 21 Aug 2014 19:09:21 GMT', 'content-type': 'application/json'})
RESP BODY: {"access": {"token": {"issued_at": "2014-08-21T19:09:21.692110", "expires": "2014-08-21T20:09:21Z", "id": "{SHA1}99ff604f28f5706bfd82a00c21e099cba7fafab2", "tenant": {"enabled": true, "description": null, "name": "admin", "id": "32c13e88d51e49179c28520f688fa74d"}}, "serviceCatalog": [{"endpoints_links": [], "endpoints": [{"adminURL": "http://openstack.danielwatrous.com:8774/v2/32c13e88d51e49179c28520f688fa74d", "region": "RegionOne", "publicURL": "http://openstack.danielwatrous.com:8774/v2/32c13e88d51e49179c28520f688fa74d", "internalURL": "http://openstack.danielwatrous.com:8774/v2/32c13e88d51e49179c28520f688fa74d", "id": "03d570ce41c04daeb7ffa274c20435f0"}], "type": "compute", "name": "nova"}, {"endpoints_links": [], "endpoints": [{"adminURL": "http://openstack.danielwatrous.com:8776/v2/32c13e88d51e49179c28520f688fa74d", "region": "RegionOne", "publicURL": "http://openstack.danielwatrous.com:8776/v2/32c13e88d51e49179c28520f688fa74d", "internalURL": "http://openstack.danielwatrous.com:8776/v2/32c13e88d51e49179c28520f688fa74d", "id": "20d2caebf4814e1bb2c05f30a4802a2c"}], "type": "volumev2", "name": "cinderv2"}, {"endpoints_links": [], "endpoints": [{"adminURL": "http://openstack.danielwatrous.com:8774/v3", "region": "RegionOne", "publicURL": "http://openstack.danielwatrous.com:8774/v3", "internalURL": "http://openstack.danielwatrous.com:8774/v3", "id": "47f43a622264422f8980f3b0fbac5f00"}], "type": "computev3", "name": "novav3"}, {"endpoints_links": [], "endpoints": [{"adminURL": "http://openstack.danielwatrous.com:3333", "region": "RegionOne", "publicURL": "http://openstack.danielwatrous.com:3333", "internalURL": "http://openstack.danielwatrous.com:3333", "id": "149e00e61cc543cf94ae6162f79d9f00"}], "type": "s3", "name": "s3"}, {"endpoints_links": [], "endpoints": [{"adminURL": "http://openstack.danielwatrous.com:9292", "region": "RegionOne", "publicURL": "http://openstack.danielwatrous.com:9292", "internalURL": "http://openstack.danielwatrous.com:9292", "id": "1b7a45b6d1c840978491250fd1a67204"}], "type": "image", "name": "glance"}, {"endpoints_links": [], "endpoints": [{"adminURL": "http://openstack.danielwatrous.com:8000/v1", "region": "RegionOne", "publicURL": "http://openstack.danielwatrous.com:8000/v1", "internalURL": "http://openstack.danielwatrous.com:8000/v1", "id": "0b8abc323d884a0aa657bcb2f0274ee5"}], "type": "cloudformation", "name": "heat-cfn"}, {"endpoints_links": [], "endpoints": [{"adminURL": "http://openstack.danielwatrous.com:8776/v1/32c13e88d51e49179c28520f688fa74d", "region": "RegionOne", "publicURL": "http://openstack.danielwatrous.com:8776/v1/32c13e88d51e49179c28520f688fa74d", "internalURL": "http://openstack.danielwatrous.com:8776/v1/32c13e88d51e49179c28520f688fa74d", "id": "63675bf8e9a04d199cffafb7b8354b05"}], "type": "volume", "name": "cinder"}, {"endpoints_links": [], "endpoints": [{"adminURL": "http://openstack.danielwatrous.com:8773/services/Admin", "region": "RegionOne", "publicURL": "http://openstack.danielwatrous.com:8773/services/Cloud", "internalURL": "http://openstack.danielwatrous.com:8773/services/Cloud", "id": "520de40a96ea47c4a08c3ae5e0a8243c"}], "type": "ec2", "name": "ec2"}, {"endpoints_links": [], "endpoints": [{"adminURL": "http://openstack.danielwatrous.com:8004/v1/32c13e88d51e49179c28520f688fa74d", "region": "RegionOne", "publicURL": "http://openstack.danielwatrous.com:8004/v1/32c13e88d51e49179c28520f688fa74d", "internalURL": "http://openstack.danielwatrous.com:8004/v1/32c13e88d51e49179c28520f688fa74d", "id": "0dcff3f004ff4c9b9b96d012e47d2edb"}], "type": "orchestration", "name": "heat"}, {"endpoints_links": [], "endpoints": [{"adminURL": "http://openstack.danielwatrous.com:35357/v2.0", "region": "RegionOne", "publicURL": "http://openstack.danielwatrous.com:5000/v2.0", "internalURL": "http://openstack.danielwatrous.com:5000/v2.0", "id": "6f43e35702844e149dde900124c352bf"}], "type": "identity", "name": "keystone"}], "user": {"username": "admin", "roles_links": [], "id": "b9936b16c5d343588f5a19d31a55c1ea", "roles": [{"name": "_member_"}, {"name": "heat_stack_owner"}, {"name": "admin"}], "name": "admin"}, "metadata": {"is_admin": 0, "roles": ["9fe2ff9ee4384b1894a90878d3e92bab", "c28444beb7e64b4ea2ea223a6efcba6a", "3ea10423929b47779f977e11015fe480"]}}}
REQ: curl -i 'http://openstack.danielwatrous.com:8774/v2/32c13e88d51e49179c28520f688fa74d/flavors/detail' -X GET -H "Accept: application/json" -H "User-Agent: python-novaclient" -H "X-Auth-Project-Id: admin" -H "X-Auth-Token: {SHA1}99ff604f28f5706bfd82a00c21e099cba7fafab2"
INFO (connectionpool:258) Starting new HTTP connection (1): proxy.company.com
DEBUG (connectionpool:375) Setting read timeout to 600.0
DEBUG (connectionpool:415) "GET http://openstack.danielwatrous.com:8774/v2/32c13e88d51e49179c28520f688fa74d/flavors/detail HTTP/1.1" 200 3337
RESP: [200] CaseInsensitiveDict({'content-length': '3337', 'proxy-connection': 'Keep-Alive', 'x-compute-request-id': 'req-802ef8c9-d4a3-41e5-a93d-7ab2120089db', 'connection': 'Keep-Alive', 'date': 'Thu, 21 Aug 2014 19:09:24 GMT', 'content-type': 'application/json', 'age': '0'})
RESP BODY: {"flavors": [{"name": "m1.tiny", "links": [{"href": "http://openstack.danielwatrous.com:8774/v2/32c13e88d51e49179c28520f688fa74d/flavors/1", "rel": "self"}, {"href": "http://openstack.danielwatrous.com:8774/32c13e88d51e49179c28520f688fa74d/flavors/1", "rel": "bookmark"}], "ram": 512, "OS-FLV-DISABLED:disabled": false, "vcpus": 1, "swap": "", "os-flavor-access:is_public": true, "rxtx_factor": 1.0, "OS-FLV-EXT-DATA:ephemeral": 0, "disk": 1, "id": "1"}, {"name": "m1.small", "links": [{"href": "http://openstack.danielwatrous.com:8774/v2/32c13e88d51e49179c28520f688fa74d/flavors/2", "rel": "self"}, {"href": "http://openstack.danielwatrous.com:8774/32c13e88d51e49179c28520f688fa74d/flavors/2", "rel": "bookmark"}], "ram": 2048, "OS-FLV-DISABLED:disabled": false, "vcpus": 1, "swap": "", "os-flavor-access:is_public": true, "rxtx_factor": 1.0, "OS-FLV-EXT-DATA:ephemeral": 0, "disk": 20, "id": "2"}, {"name": "m1.medium", "links": [{"href": "http://openstack.danielwatrous.com:8774/v2/32c13e88d51e49179c28520f688fa74d/flavors/3", "rel": "self"}, {"href": "http://openstack.danielwatrous.com:8774/32c13e88d51e49179c28520f688fa74d/flavors/3", "rel": "bookmark"}], "ram": 4096, "OS-FLV-DISABLED:disabled": false, "vcpus": 2, "swap": "", "os-flavor-access:is_public": true, "rxtx_factor": 1.0, "OS-FLV-EXT-DATA:ephemeral": 0, "disk": 40, "id": "3"}, {"name": "m1.large", "links": [{"href": "http://openstack.danielwatrous.com:8774/v2/32c13e88d51e49179c28520f688fa74d/flavors/4", "rel": "self"}, {"href": "http://openstack.danielwatrous.com:8774/32c13e88d51e49179c28520f688fa74d/flavors/4", "rel": "bookmark"}], "ram": 8192, "OS-FLV-DISABLED:disabled": false, "vcpus": 4, "swap": "", "os-flavor-access:is_public": true, "rxtx_factor": 1.0, "OS-FLV-EXT-DATA:ephemeral": 0, "disk": 80, "id": "4"}, {"name": "m1.nano", "links": [{"href": "http://openstack.danielwatrous.com:8774/v2/32c13e88d51e49179c28520f688fa74d/flavors/42", "rel": "self"}, {"href": "http://openstack.danielwatrous.com:8774/32c13e88d51e49179c28520f688fa74d/flavors/42", "rel": "bookmark"}], "ram": 64, "OS-FLV-DISABLED:disabled": false, "vcpus": 1, "swap": "", "os-flavor-access:is_public": true, "rxtx_factor": 1.0, "OS-FLV-EXT-DATA:ephemeral": 0, "disk": 0, "id": "42"}, {"name": "m1.heat", "links": [{"href": "http://openstack.danielwatrous.com:8774/v2/32c13e88d51e49179c28520f688fa74d/flavors/451", "rel": "self"}, {"href": "http://openstack.danielwatrous.com:8774/32c13e88d51e49179c28520f688fa74d/flavors/451", "rel": "bookmark"}], "ram": 512, "OS-FLV-DISABLED:disabled": false, "vcpus": 1, "swap": "", "os-flavor-access:is_public": true, "rxtx_factor": 1.0, "OS-FLV-EXT-DATA:ephemeral": 0, "disk": 0, "id": "451"}, {"name": "m1.xlarge", "links": [{"href": "http://openstack.danielwatrous.com:8774/v2/32c13e88d51e49179c28520f688fa74d/flavors/5", "rel": "self"}, {"href": "http://openstack.danielwatrous.com:8774/32c13e88d51e49179c28520f688fa74d/flavors/5", "rel": "bookmark"}], "ram": 16384, "OS-FLV-DISABLED:disabled": false, "vcpus": 8, "swap": "", "os-flavor-access:is_public": true, "rxtx_factor": 1.0, "OS-FLV-EXT-DATA:ephemeral": 0, "disk": 160, "id": "5"}, {"name": "m1.micro", "links": [{"href": "http://openstack.danielwatrous.com:8774/v2/32c13e88d51e49179c28520f688fa74d/flavors/84", "rel": "self"}, {"href": "http://openstack.danielwatrous.com:8774/32c13e88d51e49179c28520f688fa74d/flavors/84", "rel": "bookmark"}], "ram": 128, "OS-FLV-DISABLED:disabled": false, "vcpus": 1, "swap": "", "os-flavor-access:is_public": true, "rxtx_factor": 1.0, "OS-FLV-EXT-DATA:ephemeral": 0, "disk": 0, "id": "84"}]}
| ID  | Name      | Memory_MB | Disk | Ephemeral | Swap_MB | VCPUs | RXTX_Factor | Is_Public |
| 1   | m1.tiny   | 512       | 1    | 0         |         | 1     | 1.0         | True      |
| 2   | m1.small  | 2048      | 20   | 0         |         | 1     | 1.0         | True      |
| 3   | m1.medium | 4096      | 40   | 0         |         | 2     | 1.0         | True      |
| 4   | m1.large  | 8192      | 80   | 0         |         | 4     | 1.0         | True      |
| 42  | m1.nano   | 64        | 0    | 0         |         | 1     | 1.0         | True      |
| 451 | m1.heat   | 512       | 0    | 0         |         | 1     | 1.0         | True      |
| 5   | m1.xlarge | 16384     | 160  | 0         |         | 8     | 1.0         | True      |
| 84  | m1.micro  | 128       | 0    | 0         |         | 1     | 1.0         | True      |

The first two sections are calls the REST APIs, first for the keystone service to Authenticate and receive a token. Responses come as JSON due to the Accept header of application/json. If you look closely, you’ll see that the response actually included an access token and entry point URLs for each of the services that are integrated with keystone. These make up the ServiceCatalog and in this case there are ten.

The second section is the actual call to the nova API. In this case it returns a list of eight flavors. The final section is a tabular view of the JSON response created by the nova command line client.


If you look closely at the debug output of the examples above, you’ll see that the command line clients use cURL to make HTTP requests. We can already see what the authentication call looks like. Calls directly to cURL look similar. For example, here I call the keystone service to get a list of tenants.

$ curl -i -X GET http://openstack.danielwatrous.com:35357/v2.0/tenants -H "User-Agent: linux-command-line" -H "X-Auth-Token: TOKEN"
HTTP/1.1 200 OK
Date: Thu, 21 Aug 2014 20:05:39 GMT
Server: Apache/2.4.7 (Ubuntu)
Vary: X-Auth-Token
Content-Length: 546
Content-Type: application/json
Proxy-Connection: Keep-Alive
Connection: Keep-Alive
{"tenants_links": [], "tenants": [{"description": null, "enabled": true, "id": "1b7f733fa1394b9fb96838d3d7c6feea", "name": "service"}, {"description": null, "enabled": true, "id": "298cfcec9e9e49858e9b8e83d6b7d14e", "name": "demo"}, {"description": null, "enabled": true, "id": "32c13e88d51e49179c28520f688fa74d", "name": "admin"}, {"description": null, "enabled": true, "id": "8536da0aee8149d48e1fe6078dade4bf", "name": "alt_demo"}, {"description": null, "enabled": true, "id": "e66e6a80a6014dd28c7b4c1fcad19448", "name": "invisible_to_admin"}]}

REST Client

On Windows, the tool Fiddler can be used to create REST calls. When fiddler is first started, you may need to turn off capturing of traffic. You can do this from the File menu or by pressing F12. In the right side of the window, choose the composer tab. There you can provide the URL, headers and other HTTP request details. Below you can see a call to Keystone for tokens.


The response can be viewed by selecting the resulting request in the left pane and choosing the Inspectors tab in the right pane. The results can be viewed raw, as shown here.


Fiddler also provides various parsers, including JSON, to make the content easier to visualize.



The quality of the documentation available for OpenStack APIs is really amazing. Here are a couple of starting points for you.


Software Engineering

Hadoop HDFS in Standalone Mode

My previous hadoop example operated against the local filesystem, in spite of the fact that I formatted a local HDFS partition. In order to operate against the local HDFS partition it’s necessary to first start the namenode and datanode. I mostly followed these instructions to start those processes. Here’s the most relevant part that I hadn’t done yet.

# Format the namenode
hdfs namenode -format
# Start the namenode
hdfs namenode
# Start a datanode
hdfs datanode

I was then ready to add some directories and data to the local HDFS partition. I got the idea of using http://www.gutenberg.org/ for sample data from this article.

[watrous@myhost ~]$ hdfs dfs -mkdir -p /user/watrous/gutenberg
13/11/15 16:27:30 INFO namenode.FSEditLog: Number of transactions: 4 Total time for transactions(ms): 1 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 14
[watrous@myhost ~]$ hdfs dfs -ls -R /
drwxr-xr-x   - watrous supergroup          0 2013-11-14 23:16 /user
drwxr-xr-x   - watrous supergroup          0 2013-11-14 23:16 /user/watrous
drwxr-xr-x   - watrous supergroup          0 2013-11-14 23:16 /user/watrous/gutenberg
[watrous@myhost ~]$ hdfs dfs -copyFromLocal /home/watrous/pg20417.txt /user/watrous/gutenberg
13/11/14 23:16:35 INFO hdfs.StateChange: BLOCK* allocateBlock: /user/watrous/gutenberg/pg20417.txt._COPYING_. BP-1860796918- blk_1073741825_1001{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[|RBW]]}
13/11/14 23:16:35 INFO datanode.DataNode: Receiving BP-1860796918- src: / dest: /
13/11/14 23:16:36 INFO DataNode.clienttrace: src: /, dest: /, bytes: 674570, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_1914205347_1, offset: 0, srvID: DS-819252937-, blockid: BP-1860796918-, duration: 26478978
13/11/14 23:16:36 INFO BlockStateChange: BLOCK* addStoredBlock: blockMap updated: is added to blk_1073741825_1001{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[|RBW]]} size 0
13/11/14 23:16:36 INFO datanode.DataNode: PacketResponder: BP-1860796918-, type=LAST_IN_PIPELINE, downstreams=0:[] terminating
13/11/14 23:16:36 INFO hdfs.StateChange: DIR* completeFile: /user/watrous/gutenberg/pg20417.txt._COPYING_ is closed by DFSClient_NONMAPREDUCE_1914205347_1
[watrous@myhost ~]$ hdfs dfs -ls /user/watrous/gutenberg
Found 1 items
-rw-r--r--   3 watrous supergroup     674570 2013-11-14 23:16 /user/watrous/gutenberg/pg20417.txt

Accessing HDFS from MapReduce classes

It turns out that no modifications are required in the Mapper and Reducer classes to work with files in HDFS. When a new Path object is created with a single String, the constructor performs some analysis to determine if there is a scheme and authority. The global configuration is then used to make the path qualified, which includes applying the hdfs scheme if there isn’t one already provided. Here are two examples where I call the same MapReduce script with different paths, first a local path, then an HDFS path.

[watrous@myhost ~]$ hadoop jar HadoopExample.jar /opt/mount/input/sample/ ~/output.multiple
13/11/14 22:44:48 INFO mapred.MapTask: Processing split: file:/opt/mount/input/sample/catalina.out-20131027.gz:0+4416223
[watrous@myhost ~]$ hadoop jar HadoopExample.jar /user/watrous/log_myhost.com /user/watrous/output_myhost.com
13/11/15 17:07:21 INFO mapred.MapTask: Processing split: hdfs://localhost/user/watrous/log_myhost.com/catalina.out-20131029.gz:0+8924461

It is also possible to eliminate ambiguity and explicitly define the hdfs paths as shown here.

[watrous@myhost ~]$ hadoop jar HadoopExample.jar hdfs://localhost/user/watrous/log_myhost.com hdfs://localhost/user/watrous/output_myhost.com-withscheme
13/11/15 21:49:21 INFO mapred.MapTask: Processing split: hdfs://localhost/user/watrous/log_myhost.com/catalina.out-20131031.gz:0+8118929

View results in HDFS

To view the results from HDFS, first get the path to the results, then use something like -cat to output results to stdout.

[watrous@myhost ~]$ hdfs dfs -ls /user/watrous/output_myhost.com
Found 2 items
-rw-r--r--   3 watrous supergroup          0 2013-11-15 17:08 /user/watrous/output_myhost.com/_SUCCESS
-rw-r--r--   3 watrous supergroup      17893 2013-11-15 17:08 /user/watrous/output_myhost.com/part-r-00000

With the path it’s now easy to print our results.

[watrous@myhost ~]$ hdfs dfs -cat /user/watrous/output_myhost.com/part-r-00000
2013-10-26 19:18:05,669 1
2013-10-26 19:31:00,452 1
2013-10-27 06:09:33,748 11
2013-10-27 06:09:33,749 25
2013-10-27 06:09:33,750 26