In a previous post I illustrated the use of Hadoop to analyze Apache Tomcat log files (catalina.out). Below I perform the same Tomcat log analysis using PIG. The motivation behind PIG is the ability us a descriptive language to analyze large sets of data rather than writing code to process it, using Java or Python for example. PIG latin is the descriptive query language and has some similarities with SQL. These include grouping and filtering. Load in the data First I launch into the interactive local PIG command line, grunt. Commands are not......
Continue Reading
One of the Java applications I develop deploys in Tomcat and is load-balanced across a couple dozen servers. Each server can produce gigabytes of log output daily due to the high volume. This post demonstrates simple use of hadoop to quickly extract useful and relevant information from catalina.out files using Map Reduce. I followed Hadoop: The Definitive Guide for setup and example code. Installing Hadoop Hadoop in standalone mode was the most convenient for initial development of the Map Reduce classes. The following commands were executed on a virtual server running RedHat Enterprise......
Continue Reading
Most Java programmers are very familiar with the mechanism to extend a class. To do this, you simply create a new class and specify that it extends another class. You can then add funtionality not available in the original class, AND you can also override any functionality that existed already. Imagine a simple class public class Message { private String message; public Message(String message) { this.message = message; } public void showMessage() { System.out.println(message); } }public class Message { private String message; public Message(String message) { this.message = message; } public......
Continue Reading
As the scale of web applications increases, performance optimization considerations are more frequently included in initial design. One optimization technique used extensively is caching. A cache contains pre-processed data that is ready to use without redoing the processing. Processing may include extensive computation and accessing data over a network or on disk. Keep the details out of your code One design consideration when introducing a caching mechanism in your code is to keep the details out of your code. Most caches are just simple key value stores, and so it’s tempting to introduce......
Continue Reading
A few days ago I wrote about how to structure version details in MongoDB. In this and subsequent articles I’m going to present a Java based approach to working with that revision data. I have published all this work as an open source repository on github. Feel free to fork it: https://github.com/dwatrous/mongodb-revision-objects Design Decisions To begin with, here are a few design rules that should direct the implementation: Program to interfaces. Choice of datastore or other technologies should not be visible in application code Application code should never deal with versioned objects. It......
Continue Reading
Last week I spent way too much time integrating Apache Wicket and Google Guice. Yikes! The most difficult part for me was getting the initialization to happen in the right order. A big Thank You to Dan Retzlaff on the Wicket list for helping work through these details. The details below were applied to a Wicket quickstart project for Wicket 6.0.0. Design Decisions It was important to me to keep the application tier separate from web tier. I actually maintain each in a separate repository. I have several motivations for this, such as:......
Continue Reading
I’ve been refactoring an application recently to move away from a proprietary and inflexible in memory datastore. The drawbacks of the proprietary datastore included the fact that the content was static. The only way to update data involved a build and replication process that took much longer than the stakeholders were willing to wait. The main selling point in favor of the in memory datastore was that it is blazing fast. And I mean blazing fast. My choice for a replacement datastore technology is MongoDB. MongoDB worked great, but the profiling and performance......
Continue Reading
In a previous article I demonstrated one way to create a RESTful interface using a plain Java Servlet. In this article I wanted to extend that to include JSON serialization using Jackson. I found a very simple article showing a basic case mapping a POJO to JSON and back again. However, when I copied this straight over I got the following error: org.codehaus.jackson.map.JsonMappingException: No serializer found for class DataClass and no properties discovered to create BeanSerializer (to avoid exception, disable SerializationConfig.Feature.FAIL_ON_EMPTY_BEANS)org.codehaus.jackson.map.JsonMappingException: No serializer found for class DataClass and no properties discovered to create......
Continue Reading
For a recent project I found that a RESTful interface would be appropriate. My first inclination was to use Jersey (or one of the JAX-RS implementations available). The environment where this new REST API would deploy is still using Java 1.5. This became a major roadblock when I was found that none of the JAX-RS implementations provide support for the Java 1.5 virtual machine. This is not surprising since it’s few YEARS past EOSL (end of support life) for Java 1.5, but disappointing still the same. After spending a day or so with......
Continue Reading
One disappointment of developing for Wicket and Google App Engine (GAE) is that the automatic monitoring and reloading of modified HTML files didn’t work. It had something to do with the single threaded nature of the GAE platform. I had found a few previous efforts to make this work, but none of them worked with the current version of Wicket and GAE. I went without it for a while, but restarting the web server after every markup change finally drove me to figure it out. Working with the project that I setup using......
Continue Reading