As a software engineer I love well crafted software. Carefully chosen abstractions, effective use of patterns, thorough test coverage, all increase business value. Craft takes time and requires skill and proper tools and resources. Unfortunately, I frequently find myself frustrated that business partners see value differently and care only about delivering a fixed set of features quickly. While exploring Cloud Foundry last week I came across a video clip featuring a presentation by Jonathan Murray, EVP and Chief Technology Officer at Warner Music Group. One quote from that video clip (at about 14:30)......
Continue Reading
I’ve recently done some load testing using Locust.io. The setup was more complicated than other tools and I didn’t feel like it was well documented on their site. Here’s how I got Locust.io running on two different Linux platforms. Locust.io on RedHat Enterprise Linux (RHEL) or CentOS Naturally, these instructions will work on CentOS too. sudo yum -y install python-setuptools python-devel sudo yum -y install libevent libevent-develsudo yum -y install python-setuptools python-devel sudo yum -y install libevent libevent-devel One requirement of Locust.io is ZeroMQ. I found instructions to install that on their site......
Continue Reading
Load testing web applications is a big deal in a day of web scale traffic. There are countless ways to get traffic to a website, and when one of them goes right (like a slashdot or viral content), it can produce an enormous load in a very short time. Building and testing large scale software requires being able to simulate high levels of load. There are compelling commercial and open source load testing options available. Three that span much of the functional landscape are HP’s LoadRunner, Locust.io and Apache Jmeter. Let’s start with......
Continue Reading
About 13 years ago I created my first integration with Authorize.net for a client who wanted to accept credit card payments directly on his website. The internet has changed a lot since then and the frequency of fraud attempts has increased. One credit card fraud signature I identified while reviewing my server logs for one of my e-commerce websites was consistent. I refer to this is a shotgun attack, since the hacker sends through hundreds of credit card attempts. Here’s how it works and what to look for. All requests from a single......
Continue Reading
In a previous post I illustrated the use of Hadoop to analyze Apache Tomcat log files (catalina.out). Below I perform the same Tomcat log analysis using PIG. The motivation behind PIG is the ability us a descriptive language to analyze large sets of data rather than writing code to process it, using Java or Python for example. PIG latin is the descriptive query language and has some similarities with SQL. These include grouping and filtering. Load in the data First I launch into the interactive local PIG command line, grunt. Commands are not......
Continue Reading
I read that Hadoop supports scripts written in various languages other than Java, such as Python. Since I’m a fan of python, I wanted to prove this out. It was my good fortune to find an excellent post by Michael Noll that walked me through the entire process of scripting in Python for Hadoop. It’s an excellent post and worked as written for me in Hadoop 2.2.0. How hadoop processes scripts from other languages (stdin/stdout) In order to accommodate scripts from other languages, hadoop focuses on standard in (stdin) and standard out (stdout).......
Continue Reading
The log rotation mechanism on my servers automatically compresses (gzip) the rotated log file to save on disk space. I discovered that Hadoop is already designed to deal with compressed files using gzip, bzip2 and LZO out of the box. This means that no additional work is required in the Mapper class to decompress. Here’s a snippet from the MapReduce output that shows 13/11/15 22:01:46 INFO mapred.MapTask: Processing split: hdfs://localhost/user/watrous/log_myhost.com/catalina.out-20131103.gz:0+3058954 13/11/15 22:01:46 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 13/11/15 22:01:46 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 13/11/15 22:01:46 INFO mapred.MapTask: mapreduce.task.io.sort.mb:......
Continue Reading
My previous hadoop example operated against the local filesystem, in spite of the fact that I formatted a local HDFS partition. In order to operate against the local HDFS partition it’s necessary to first start the namenode and datanode. I mostly followed these instructions to start those processes. Here’s the most relevant part that I hadn’t done yet. # Format the namenode hdfs namenode -format # Start the namenode hdfs namenode # Start a datanode hdfs datanode# Format the namenode hdfs namenode -format # Start the namenode hdfs namenode # Start a datanode......
Continue Reading
One of the Java applications I develop deploys in Tomcat and is load-balanced across a couple dozen servers. Each server can produce gigabytes of log output daily due to the high volume. This post demonstrates simple use of hadoop to quickly extract useful and relevant information from catalina.out files using Map Reduce. I followed Hadoop: The Definitive Guide for setup and example code. Installing Hadoop Hadoop in standalone mode was the most convenient for initial development of the Map Reduce classes. The following commands were executed on a virtual server running RedHat Enterprise......
Continue Reading
Most Java programmers are very familiar with the mechanism to extend a class. To do this, you simply create a new class and specify that it extends another class. You can then add funtionality not available in the original class, AND you can also override any functionality that existed already. Imagine a simple class public class Message { private String message; public Message(String message) { this.message = message; } public void showMessage() { System.out.println(message); } }public class Message { private String message; public Message(String message) { this.message = message; } public......
Continue Reading