Detecting Credit Card Fraud – Frequency Algorithm

About 13 years ago I created my first integration with Authorize.net for a client who wanted to accept credit card payments directly on his website. The internet has changed a lot since then and the frequency of fraud attempts has increased. One credit card fraud signature I identified while reviewing my server logs for one […]

Read more

Analyze Tomcat Logs using PIG (hadoop)

In a previous post I illustrated the use of Hadoop to analyze Apache Tomcat log files (catalina.out). Below I perform the same Tomcat log analysis using PIG. The motivation behind PIG is the ability us a descriptive language to analyze large sets of data rather than writing code to process it, using Java or Python […]

Read more

Hadoop Scripts in Python

I read that Hadoop supports scripts written in various languages other than Java, such as Python. Since I’m a fan of python, I wanted to prove this out. It was my good fortune to find an excellent post by Michael Noll that walked me through the entire process of scripting in Python for Hadoop. It’s […]

Read more

Hadoop Takes Compressed Files (gzip, bzip2) as Direct Input

The log rotation mechanism on my servers automatically compresses (gzip) the rotated log file to save on disk space. I discovered that Hadoop is already designed to deal with compressed files using gzip, bzip2 and LZO out of the box. This means that no additional work is required in the Mapper class to decompress. Here’s […]

Read more

Hadoop HDFS in Standalone Mode

My previous hadoop example operated against the local filesystem, in spite of the fact that I formatted a local HDFS partition. In order to operate against the local HDFS partition it’s necessary to first start the namenode and datanode. I mostly followed these instructions to start those processes. Here’s the most relevant part that I […]

Read more

Use Hadoop to Analyze Java Logs (Tomcat catalina.out)

One of the Java applications I develop deploys in Tomcat and is load-balanced across a couple dozen servers. Each server can produce gigabytes of log output daily due to the high volume. This post demonstrates simple use of hadoop to quickly extract useful and relevant information from catalina.out files using Map Reduce. I followed Hadoop: […]

Read more

Override Java Methods on Instantiation

Most Java programmers are very familiar with the mechanism to extend a class. To do this, you simply create a new class and specify that it extends another class. You can then add funtionality not available in the original class, AND you can also override any functionality that existed already. Imagine a simple class public […]

Read more

Caching in Java using Aspect Oriented Programming (AOP)

As the scale of web applications increases, performance optimization considerations are more frequently included in initial design. One optimization technique used extensively is caching. A cache contains pre-processed data that is ready to use without redoing the processing. Processing may include extensive computation and accessing data over a network or on disk. Keep the details […]

Read more

Using Java to work with Versioned Data

A few days ago I wrote about how to structure version details in MongoDB. In this and subsequent articles I’m going to present a Java based approach to working with that revision data. I have published all this work as an open source repository on github. Feel free to fork it: https://github.com/dwatrous/mongodb-revision-objects Design Decisions To […]

Read more

Representing Revision Data in MongoDB

The need to track changes to web content and provide for draft or preview functionality is common to many web applications today. In relational databases it has long been common to accomplish this using a recursive relationship within a single table or by splitting the table out and storing version details in a secondary table. […]

Read more