Using Vagrant to Explore Ansible

Last week I wrote about Vagrant, a fantastic tool to spin up virtual development environments. Today I’m exploring Ansible. Ansible is an open source tool which streamlines certain system administration activities. Unlike Vagrant, which provisions new machines, Ansible takes an already provisioned machine and configures it. This can include installing and configuring software, managing services, […]

Read more

Using Vagrant to build a LEMP stack

I may have just fallen in love with the tool Vagrant. Vagrant makes it possible to quickly create a virtual environment for development. It is different than cloning or snapshots in that it uses minimal base OSes and provides a provisioning mechanism to setup and configure the environment exactly the way you want for development. […]

Read more

Craft vs. Delivery in Software Development

As a software engineer I love well crafted software. Carefully chosen abstractions, effective use of patterns, thorough test coverage, all increase business value. Craft takes time and requires skill and proper tools and resources. Unfortunately, I frequently find myself frustrated that business partners see value differently and care only about delivering a fixed set of […]

Read more

Load Testing with Locust.io

I’ve recently done some load testing using Locust.io. The setup was more complicated than other tools and I didn’t feel like it was well documented on their site. Here’s how I got Locust.io running on two different Linux platforms. Locust.io on RedHal Enterprise Linux (RHEL) Naturally, these instructions will work on CentOS too. sudo yum […]

Read more

Load Testing Alternatives for Large Scale Web Applications

Load testing web applications is a big deal in a day of web scale traffic. There are countless ways to get traffic to a website, and when one of them goes right (like a slashdot or viral content), it can produce an enormous load in a very short time. Building and testing large scale software […]

Read more

Detecting Credit Card Fraud – Frequency Algorithm

About 13 years ago I created my first integration with Authorize.net for a client who wanted to accept credit card payments directly on his website. The internet has changed a lot since then and the frequency of fraud attempts has increased. One credit card fraud signature I identified while reviewing my server logs for one […]

Read more

Analyze Tomcat Logs using PIG (hadoop)

In a previous post I illustrated the use of Hadoop to analyze Apache Tomcat log files (catalina.out). Below I perform the same Tomcat log analysis using PIG. The motivation behind PIG is the ability us a descriptive language to analyze large sets of data rather than writing code to process it, using Java or Python […]

Read more

Hadoop Scripts in Python

I read that Hadoop supports scripts written in various languages other than Java, such as Python. Since I’m a fan of python, I wanted to prove this out. It was my good fortune to find an excellent post by Michael Noll that walked me through the entire process of scripting in Python for Hadoop. It’s […]

Read more

Hadoop Takes Compressed Files (gzip, bzip2) as Direct Input

The log rotation mechanism on my servers automatically compresses (gzip) the rotated log file to save on disk space. I discovered that Hadoop is already designed to deal with compressed files using gzip, bzip2 and LZO out of the box. This means that no additional work is required in the Mapper class to decompress. Here’s […]

Read more

Hadoop HDFS in Standalone Mode

My previous hadoop example operated against the local filesystem, in spite of the fact that I formatted a local HDFS partition. In order to operate against the local HDFS partition it’s necessary to first start the namenode and datanode. I mostly followed these instructions to start those processes. Here’s the most relevant part that I […]

Read more