Daniel Watrous on Software Engineering

A Collection of Software Problems and Solutions

Posts tagged java

Software Engineering

Analyze Tomcat Logs using PIG (hadoop)

In a previous post I illustrated the use of Hadoop to analyze Apache Tomcat log files (catalina.out). Below I perform the same Tomcat log analysis using PIG.

The motivation behind PIG is the ability us a descriptive language to analyze large sets of data rather than writing code to process it, using Java or Python for example. PIG latin is the descriptive query language and has some similarities with SQL. These include grouping and filtering.

Load in the data

First I launch into the interactive local PIG command line, grunt. Commands are not case sensitive, but it can be helpful to distinguish function names from variables. I show all commands in CAPS. Since the catalina.out data is not in a structured format (csv, tab, etc.), I load each line as a chararray (string).

[watrous@myhost ~]$ pig -x local
grunt> raw_log_entries = LOAD '/opt/mount/input/sample/catalina.out' USING TextLoader AS (line:chararray);
grunt> illustrate raw_log_entries;
--------------------------------------------------------------------------------------------------------------------------------------------
| raw_log_entries     | line:chararray                                                                                                     |
--------------------------------------------------------------------------------------------------------------------------------------------
|                     | 2013-10-30 04:20:18,897 DEBUG component.JVFunctions  - JVList: got docc03931336Instant Ink 2 - Alert # q14373261 |
--------------------------------------------------------------------------------------------------------------------------------------------

Note that it is also possible to provide a directory and PIG will load all files in the given directory.

Use regular expressions to parse each line

Now that I have the data in, I want to split each line into fields. To do this in PIG I use regular expressions with the REGEX_EXTRACT_ALL function. Notice that I double escape regex symbols, such as \\s for space. In the command below, the FLATTEN turns the matched values into a tuple that can be matched up with the AS fields. I’m treating all fields as chararray.

grunt> logs_base = FOREACH raw_log_entries GENERATE
>> FLATTEN(
>> REGEX_EXTRACT_ALL(line, '^([0-9]{4}-[0-9]{2}-[0-9]{2}\\s[0-9:,]+)\\s([a-zA-Z]+)\\s+([a-zA-Z0-9.]+)\\s+(.*)$')
>> ) AS (
>> logDate:      chararray,
>> logLevel:     chararray,
>> logClass:     chararray,
>> logMessage:   chararray
>> );
grunt> illustrate logs_base;
-----------------------------------------------------------------------------------------------------------
| raw_log_entries     | line:chararray                                                                    |
-----------------------------------------------------------------------------------------------------------
|                     | 2013-11-08 04:26:27,966 DEBUG component.JVFunctions  - Visible Level Added :LEV1 |
-----------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------
| logs_base     | logDate:chararray       | logLevel:chararray      | logClass:chararray      | logMessage:chararray        |
-----------------------------------------------------------------------------------------------------------------------------
|               | 2013-11-08 04:26:27,966 | DEBUG                   | component.JVFunctions  | - Visible Level Added :LEV1 |
-----------------------------------------------------------------------------------------------------------------------------

Filter and Group and Generate the desired output

I want to report on the ERROR logs by timestamp. I first filter the log base by the logLevel field. I then group the filtered records by logDate. Finally I use the FOREACH function to GENERATE a result set including the timestamp and a count of errors at that time. Finally I dump the results.

grunt> filtered_records = FILTER logs_base BY logLevel == 'ERROR';
grunt> grouped_records = GROUP filtered_records BY logDate;
grunt> log_count = FOREACH grouped_records GENERATE group, COUNT(filtered_records);
grunt> dump log_count
 
HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
1.0.0   0.12.0  watrous 2013-12-05 21:38:54     2013-12-05 21:39:15     GROUP_BY,FILTER
 
Success!
 
Job Stats (time in seconds):
JobId   Alias   Feature Outputs
job_local_0002  filtered_records,grouped_records,log_count,logs_base,raw_log_entries    GROUP_BY,COMBINER       file:/tmp/temp1196141656/tmp-135873072,
 
Input(s):
Successfully read records from: "/opt/mount/input/sample/catalina.out"
 
Output(s):
Successfully stored records in: "file:/tmp/temp1196141656/tmp-135873072"
 
Job DAG:
job_local_0002
 
 
2013-12-05 21:39:15,813 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success!
2013-12-05 21:39:15,814 [main] WARN  org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized
2013-12-05 21:39:15,815 [main] INFO  org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1
2013-12-05 21:39:15,815 [main] INFO  org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(2013-11-08 04:04:51,894,2)
(2013-11-08 05:04:52,711,2)
(2013-11-08 05:33:23,073,3)
(2013-11-08 06:04:53,689,2)
(2013-11-08 07:04:54,366,3)
(2013-11-08 08:04:55,096,2)
(2013-11-08 13:34:28,936,2)
(2013-11-08 17:32:31,629,3)
(2013-11-08 18:50:56,971,1)
(2013-11-08 18:50:56,980,1)
(2013-11-08 18:50:56,986,1)
(2013-11-08 18:50:57,008,1)
(2013-11-08 18:50:57,017,1)
(2013-11-08 18:50:57,024,1)
(2013-11-08 18:51:17,357,1)
(2013-11-08 18:51:17,423,1)
(2013-11-08 18:51:17,491,1)
(2013-11-08 18:51:17,499,1)
(2013-11-08 18:51:17,500,1)
(2013-11-08 18:51:17,502,1)
(2013-11-08 18:51:17,503,1)
(2013-11-08 18:51:17,504,1)
(2013-11-08 18:51:17,506,1)
(2013-11-08 18:51:17,651,6)
(2013-11-08 18:51:17,652,23)
(2013-11-08 18:51:17,653,25)
(2013-11-08 18:51:17,654,19)
(2013-11-08 19:01:13,771,2)
(2013-11-08 21:32:34,522,2)

Performance in PIG

Performance is at risk, since the descriptive language PIG latin needs to be translated into one or more MapReduce steps. This translation doesn’t always provide for the best performance. However, for smaller datasets, the lower performance may be offset by eliminating the build phase required when producing your own MapReduce jobs.

Troubleshooting

I spent way more time trying to get PIG working than I felt I should have. The PIG mailing list was very helpful and quick. Here are some pointers.

Agreement of Hadoop version

PIG is compiled against a specific version of hadoop. As a result, any local Hadoop version must match the version referenced when PIG was compiled. If the local Hadoop version doesn’t agree with the version used when building PIG, it’s possible to remove all references to the local hadoop version and PIG will use its internal version of hadoop. In my case I had to remove hadoop binaries from my PATH.

Documentation and examples

There are very few examples showing the use of PIG, and of those that I found, none worked as written. This seems to indicate either that PIG is moving very fast or that the developers are unhappy with the APIs, which change frequently.

References

http://aws.amazon.com/articles/2729
Hadoop: The Definitive Guide

Software Engineering

Use Hadoop to Analyze Java Logs (Tomcat catalina.out)

One of the Java applications I develop deploys in Tomcat and is load-balanced across a couple dozen servers. Each server can produce gigabytes of log output daily due to the high volume. This post demonstrates simple use of hadoop to quickly extract useful and relevant information from catalina.out files using Map Reduce. I followed Hadoop: The Definitive Guide for setup and example code.

Installing Hadoop

Hadoop in standalone mode was the most convenient for initial development of the Map Reduce classes. The following commands were executed on a virtual server running RedHat Enterprise Linux 6.3. First verify Java 6 is installed:

[watrous@myhost ~]$ java -version
java version "1.6.0_24"
OpenJDK Runtime Environment (IcedTea6 1.11.5) (rhel-1.50.1.11.5.el6_3-x86_64)
OpenJDK 64-Bit Server VM (build 20.0-b12, mixed mode)

Next, download and extract Hadoop. Hadoop can be downloaded using a mirror. Hadoop can be setup and run locally and does not require any special privileges. Always verify that you have a good download.

[watrous@myhost ~]$ wget http://download.nextag.com/apache/hadoop/common/stable/hadoop-2.2.0.tar.gz
[watrous@myhost ~]$ md5sum hadoop-2.2.0.tar.gz
25f27eb0b5617e47c032319c0bfd9962  hadoop-2.2.0.tar.gz
[watrous@myhost ~]$ tar xzf hadoop-2.2.0.tar.gz
[watrous@myhost ~]$ hdfs namenode -format

That last command creates an HDFS file system in the tmp folder. In my case it was created here: /tmp/hadoop-watrous/dfs/.

Environment variables were added to .bash_profile for JAVA_HOME and HADOOP_INSTALL, as shown. These can also be run locally each time you login.

export JAVA_HOME=/usr/lib/jvm/jre
export HADOOP_INSTALL=/home/watrous/hadoop-2.2.0
export PATH=$PATH:$HADOOP_INSTALL/bin

I can now verify that Hadoop is installed and ready to run.

[watrous@myhost ~]$ hadoop version
Hadoop 2.2.0
Subversion https://svn.apache.org/repos/asf/hadoop/common -r 1529768
Compiled by hortonmu on 2013-10-07T06:28Z
Compiled with protoc 2.5.0
From source with checksum 79e53ce7994d1628b240f09af91e1af4
This command was run using /home/watrous/hadoop-2.2.0/share/hadoop/common/hadoop-common-2.2.0.jar

Get some seed data

Now that Hadoop is all setup, I need some seed data to operate on. For this I just reached out and grabbed a log file from one of my production servers.

[watrous@myhost ~]$ mkdir input
[watrous@myhost ~]$ scp watrous@mywebhost.com:/var/lib/tomcat/logs/catalina.out ./input/

Creating Map Reduce Classes

The most simple operation in Hadoop requires a Mapper class, a Reducer class and a third class that identifies the Mapper and Reducer including the datatypes that connect them. The examples below required two jars from the release downloaded above:

  • hadoop-2.2.0.tar.gz\hadoop-2.2.0.tar\hadoop-2.2.0\share\hadoop\common\hadoop-common-2.2.0.jar
  • hadoop-2.2.0.tar.gz\hadoop-2.2.0.tar\hadoop-2.2.0\share\hadoop\mapreduce\hadoop-mapreduce-client-core-2.2.0.jar

I also use regular expressions in Java to analyze each line in the log. Regular expressions can be more resilient to variations and allow for grouping, which gives easy access to specific data elements. As always, I used Kodos to develop the regular expression.

In the example below, I don’t actually use the log value, but instead I just count up how many occurrences there are by key.

Mapper class

import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
 
public class TomcatLogErrorMapper extends Mapper<LongWritable, Text, Text, Text> {
 
    String pattern = "([0-9]{4}-[0-9]{2}-[0-9]{2}\\s[0-9]{2}:[0-9]{2}:[0-9]{2}.[0-9]{3})\\s*([a-zA-Z]+)\\s*([a-zA-Z.]+)\\s*-\\s*(.+)$";
    // Create a Pattern object
    Pattern r = Pattern.compile(pattern);
 
    @Override
    public void map(LongWritable key, Text value, Context context)
            throws IOException, InterruptedException {
        String line = value.toString();
 
        Matcher m = r.matcher(line);
        if (m.find()) {
            // only consider ERRORs for this example
            if (m.group(2).contains("ERROR")) {
                  // example log line
                  // 2013-11-08 04:06:56,586 DEBUG component.helpers.GenericSOAPConnector  - Attempting to connect to: https://remotehost.com/app/rfc/entry/msg_status
//                System.out.println("Found value: " + m.group(0)); //complete line
//                System.out.println("Found value: " + m.group(1)); // date
//                System.out.println("Found value: " + m.group(2)); // log level
//                System.out.println("Found value: " + m.group(3)); // class
//                System.out.println("Found value: " + m.group(4)); // message
                context.write(new Text(m.group(1)), new Text(m.group(2) + m.group(3) + m.group(4)));
            }
        }
    }
}

Reducer class

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
 
public class TomcatLogErrorReducer extends Reducer<Text, Text, Text, IntWritable> {
 
    @Override
    public void reduce(Text key, Iterable<Text> values, Context context)
            throws IOException, InterruptedException {
        int countValue = 0;
        for (Text value : values) {
            countValue++;
        }
        context.write(key, new IntWritable(countValue));
    }
}

Job class with main

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
 
public class TomcatLogError {
    public static void main(String[] args) throws Exception {
        if (args.length != 2) {
            System.err.println("Usage: TomcatLogError <input path> <output path>");
            System.exit(-1);
        }
        Job job = new Job();
        job.setJarByClass(TomcatLogError.class);
        job.setJobName("Tomcat Log Error");
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        job.setMapperClass(TomcatLogErrorMapper.class);
        job.setReducerClass(TomcatLogErrorReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

Running Hadoop

In netbeans I made sure that the Main Class was TomcatLogError in the compiled jar. I then ran Clean and Build to get a jar which I transferred up to the server where I installed Hadoop.

[watrous@myhost ~]$ hadoop jar HadoopExample.jar input/catalina.out ~/output
13/11/11 19:20:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
13/11/11 19:20:52 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
...
13/11/11 18:36:57 INFO mapreduce.Job: Job job_local1725513594_0001 completed successfully
13/11/11 18:36:57 INFO mapreduce.Job: Counters: 27
        File System Counters
                FILE: Number of bytes read=430339145
                FILE: Number of bytes written=1057396
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
        Map-Reduce Framework
                Map input records=1101516
                Map output records=105
                Map output bytes=20648
                Map output materialized bytes=20968
                Input split bytes=396
                Combine input records=0
                Combine output records=0
                Reduce input groups=23
                Reduce shuffle bytes=0
                Reduce input records=105
                Reduce output records=23
                Spilled Records=210
                Shuffled Maps =0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=234
                CPU time spent (ms)=0
                Physical memory (bytes) snapshot=0
                Virtual memory (bytes) snapshot=0
                Total committed heap usage (bytes)=1827143680
        File Input Format Counters
                Bytes Read=114455257
        File Output Format Counters
                Bytes Written=844

The output folder now contains a file named part-r-00000 with the results of the processing.

[watrous@c0003913 ~]$ more output/part-r-00000
2013-11-08 04:04:51,894 2
2013-11-08 05:04:52,711 2
2013-11-08 05:33:23,073 3
2013-11-08 06:04:53,689 2
2013-11-08 07:04:54,366 3
2013-11-08 08:04:55,096 2
2013-11-08 13:34:28,936 2
2013-11-08 17:32:31,629 3
2013-11-08 18:51:17,357 1
2013-11-08 18:51:17,423 1
2013-11-08 18:51:17,491 1
2013-11-08 18:51:17,499 1
2013-11-08 18:51:17,500 1
2013-11-08 18:51:17,502 1
2013-11-08 18:51:17,503 1
2013-11-08 18:51:17,504 1
2013-11-08 18:51:17,506 1
2013-11-08 18:51:17,651 6
2013-11-08 18:51:17,652 23
2013-11-08 18:51:17,653 25
2013-11-08 18:51:17,654 19
2013-11-08 19:01:13,771 2
2013-11-08 21:32:34,522 2

Based on this analysis, there were a number of errors produced around the hour 18:51:17. It is then easy to change the Mapper class to emit based on a different key, such as Class or Message to identify more precisely what the error is, now that I know when the errors happened.

Increasing scale

The Mapper and Reducer classes can be enhanced to give more relevant details. The process of transferring the files can also be automated and the input method can be adapted to walk a directory, rather than a single file. Reports can also be aggregated and placed in a web directory or emailed.

Software Engineering

Override Java Methods on Instantiation

Most Java programmers are very familiar with the mechanism to extend a class. To do this, you simply create a new class and specify that it extends another class. You can then add funtionality not available in the original class, AND you can also override any functionality that existed already. Imagine a simple class

public class Message {
    private String message;
 
    public Message(String message) {
        this.message = message;
    }
 
    public void showMessage() {
        System.out.println(message);
    }
}

Suppose you want these messages sent to the log too. You could extend the class as follows:

public class LoggedMessage extends Message {
    private String message;
 
    public Message(String message) {
        this.message = message;
    }
 
    public void showMessage() {
        Logger.getLogger(MyClass.class .getName()).info(message);
        System.out.println(message);
    }
}

Now you could create an instance of LoggedMessage rather than Message and have all messages logged in addition to just displaying them. Pretty neat, but there might be a better way.

Override on instantiation

Imagine that you only needed messages logged in one case. In other words, the Message class is sufficient in all cases except one. In that case, you can simply create an instance of Message that overrides this functionality. This is also referred to as an anonymous subclass.

Message loggedMessage = new Message () {
    @Override
    public void showMessage() {
        Logger.getLogger(MyClass.class .getName()).info(message);
        System.out.println(message);
    }
};

Why is this great? With this approach, you avoid adding another class with an uncommon use case to your software. You still have only one Message class. However, you still benefit from the modified behavior of a sub class.

Another use case would be when you want to use Message, but showMessage is frequently different. In other words, you would end up with many sub classes to accommodate all the use cases.

When to extend

If you find yourself copying and pasting this same override in many places, you should probably create a sub class one time. You may also find that the difference each time you override the method at instantiation differs by a parameter that could be injected. In that case, you should simply define an injection point in your class and let your DI framework provide you with a Message object that is configured the way you need it.

Software Engineering

Caching in Java using Aspect Oriented Programming (AOP)

As the scale of web applications increases, performance optimization considerations are more frequently included in initial design. One optimization technique used extensively is caching. A cache contains pre-processed data that is ready to use without redoing the processing. Processing may include extensive computation and accessing data over a network or on disk.

Keep the details out of your code

One design consideration when introducing a caching mechanism in your code is to keep the details out of your code. Most caches are just simple key value stores, and so it’s tempting to introduce them where you have a performance issue. Imagine the java method.

    public Shop getShopsNear(Shop shop) {
        Shop returnShop;
        // query google maps api with shop data to get region
        // query database for region
        // calculate center of region
        // sort by distance from center
        returnShop = closestShopToCenter;
        return returnShop;
    }

Obviously that has the potential to take a long time to process. It can also be expected that there are a reasonably small number of shops. Even if there are several thousand, caching several thousand results in memory is very manageable.

Wrong way

It might be tempting to jump right in and introduce caching like this:

    public Shop getShopsNear(Shop shop) {
        Shop returnShop;
        // create a cache key
        String cacheKey = "getShopsNear: " + shop.toString();
        // check for cached result
        Cache cache = CacheProvider.getCache();
        returnShop = cache.get(cacheKey);
        if (returnShop == null) {
            // query google maps api with shop data to get region
            // query database for region
            // calculate center of region
            // sort by distance from center
            returnShop = closestShopToCenter;
            // cache result
            cache.put(cacheKey, returnShop);
        }
        return returnShop;
    }

At first glance, that’s great. It follows most quick start tutorials for caching solutions and it will improve the performance of your code. However, there are several problems with this approach. I highlight some here.

  • Higher risk of key conflicts and debugging since it’s managed at the method level
  • The code is tied to specific cache implementation
  • The method is less clear due to clutter from caching

AOP – a better way

One way to get around the problems above and have a caching mechanism that will grow with your application and provide flexibility is to use Aspect Oriented Programming. Google Guice also refers to this as Method Interception. The idea is that you identify some pre-processing that should take place before calling the actual method.

In this case, we want to put a cache interceptor at the method level and keep the processing of caching, such as key generation, cache provider selection, etc. centralized. Here’s what that might look like.

    @Cached(timeToLiveSeconds = 3600)
    public Shop getShopsNear(Shop shop) {
        Shop returnShop;
        // query google maps api with shop data to get region
        // query database for region
        // calculate center of region
        // sort by distance from center
        returnShop = closestShopToCenter;
        return returnShop;
    }

The only change to your original method is the @Cached annotation. You also have the option of defining the duration of that data in the cache, or leave it out to use the default duration.

The method interceptor code deals with the selection of cache provider, key generation, etc. This makes cache evaluation fast and effortless. It also keeps your application code clear. Changes in the future are now easy to configure.

Getting Started

I’ve created an AOP caching for Guice project. Fork the code or use it as is.

References

My library is based on the initial work by bpoulson.

Software Engineering

Using Java to work with Versioned Data

A few days ago I wrote about how to structure version details in MongoDB. In this and subsequent articles I’m going to present a Java based approach to working with that revision data.

I have published all this work as an open source repository on github. Feel free to fork it:
https://github.com/dwatrous/mongodb-revision-objects

Design Decisions

To begin with, here are a few design rules that should direct the implementation:

  1. Program to interfaces. Choice of datastore or other technologies should not be visible in application code
  2. Application code should never deal with versioned objects. It should only deal with domain objects

Starting with A above, I came up with a design involving only five interfaces. The management of the Person class is managed using VersionedPerson, Person, HistoricalPerson and PersonDAO. A fifth interface, DisplayMode, is used to facilitate display of the correct versioned data in the application. Here’s what the Person interface looks like:

public interface Person {
    PersonName getName();
    void setName(PersonName name);
    Integer getAge();
    void setAge(Integer age);
    String getEmail();
    void setEmail(String email);
    boolean isHappy();
    void setHappy(boolean happy);
    public interface PersonName {
        String getFirstName();
        void setFirstName(String firstName);
        String getLastName();
        void setLastName(String lastName);
    }
}

Note that there is no indication of any datastore related artifacts, such as an ID attribute. It also does not include any specifics about versioning, like historical meta data. This is a clean interface that should be used throughout the application code anywhere a Person is needed.

During implementation you’ll see that using a dependency injection framework makes it easy to write application code against this interface and provide any implementation at run time.

Versioning

Obviously it’s necessary to deal with the versioning somewhere in the code. The question is where and how. According to point B above, I want to conceal any hint of the versioned structure from application code. To illustrate, let’s imagine a bit of code that would retrieve and display a person’s name and email.

First I show you what you want to avoid (i.e. DO NOT DO THIS).

Person personToDisplay;
VersionedPerson versionedPerson = personDao.getPersonByName(personName);
if (displayMode.isPreviewModeActive()) {
    personToDisplay = versionedPerson.getDraft();
} else {
    personToDisplay = versionedPerson.getPublished();
}
System.out.println(personToDisplay.getName().getFirstName());
System.out.println(personToDisplay.getEmail());

There are a few problems with this approach that might not be obvious based on this simple example. One is that by allowing the PersonDAO to return a VersionedPerson, it becomes necessary to include conditional code everyehere in your application that you want to access a Person object. Imagine how costly a simple change to DisplayMode could be over time, not to mention the chance of bugs creeping in.

Another problem is that your application, which deals with Person objects, now has code throughout that introduces concepts of VersionedPerson, HistoricalPerson, etc.

In the end, all of those details relate to data access. In other words, your Data Access Object needs to be aware of these details, but the rest of your application does not. By moving all these details into your DAO, you can rewrite the above example to look like this.

Person personToDisplay = personDao.getPersonByName(personName);
System.out.println(personToDisplay.getName().getFirstName());
System.out.println(personToDisplay.getEmail());

As you can see, this keeps your application code much cleaner. The DAO has the responsibility to determine which Person object to return.

DAO design

Let’s have a closer look at the DAO. Here’s the PersonDAO interface:

public interface PersonDAO {
    void save(Person person);
    void saveDraft(Person person);
    void publish(Person person);
    Person getPersonByName(PersonName name);
    Person getPersonByName(PersonName name, Integer historyMarker);
    List<Person> getPersonsByLastName(String lastName);
}

Notice that the DAO only ever receives or returns Person objects and search parameters. At the interface level, there is no indication of an underlying datastore or other technology. There is also no indication of any versioning. This encourages application developers to keep application code clean.

Despite this clean interface, there are some complexities. Based on the structure of the mongodb document, which stores published, draft and history as nested documents in a single document, there is only one ObjectID that identifies all versions of the Person. That means that the ObjectID exists at the VersionedPerson level, not the Person level. That makes it necessary to pass some information around with the Person that will identify the VersionedPerson for write operations. This comes through in the implementation of the MorphiaPersonDAO.

Download

You can clone or download the mongodb-revision-objects code and dig in to the details yourself on github.

Software Engineering

Wicket + Guice including unittests

Last week I spent way too much time integrating Apache Wicket and Google Guice. Yikes! The most difficult part for me was getting the initialization to happen in the right order. A big Thank You to Dan Retzlaff on the Wicket list for helping work through these details.

The details below were applied to a Wicket quickstart project for Wicket 6.0.0.

Design Decisions

It was important to me to keep the application tier separate from web tier. I actually maintain each in a separate repository. I have several motivations for this, such as:

  • Clean separation of concerns. In other words, prevent logic from ending up in my Wicket pages
  • Independent revisions and release cycles between web tier and application tier
  • Easier to divide work between scrum teams

I include the application tier into the Wicket front end as a jar. By the way, this also makes it easy to include my application tier into a Jersey REST interface and other legacy servlets.

It was also important to maintain a mock package for fast unittests. Since the data providers are managed in the application tier, the mock package lives there, further enforcing the separation of concerns.

Implementation Approach

After some experimentation I decided to use the GuiceServlet approach. I started by adding the following maven dependencies to my pom.xml for the Wicket quickstart.

        <dependency>
            <groupId>com.google.inject</groupId>
            <artifactId>guice</artifactId>
            <version>3.0</version>
        </dependency>
        <dependency>
            <groupId>com.google.inject.extensions</groupId>
            <artifactId>guice-servlet</artifactId>
            <version>3.0</version>
            <type>jar</type>
        </dependency>
        <dependency>
            <groupId>org.apache.wicket</groupId>
            <artifactId>wicket-guice</artifactId>
            <version>6.7.0</version>
            <type>jar</type>
        </dependency>

My web.xml defines only the GuiceFilter and a listener to initialize the injector when the servlet context is created.

<?xml version="1.0" encoding="ISO-8859-1"?>
<web-app xmlns="http://java.sun.com/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/web-app_2_5.xsd"
	version="2.5">
 
	<display-name>guicewickettest</display-name>
 
        <listener>
            <listener-class>com.danielwatrous.myapp.web.MyGuiceServletConfig</listener-class>
        </listener>
 
        <filter>
            <filter-name>guiceFilter</filter-name>
            <filter-class>com.google.inject.servlet.GuiceFilter</filter-class>
        </filter>
 
        <filter-mapping>
            <filter-name>guiceFilter</filter-name>
            <url-pattern>/*</url-pattern>
        </filter-mapping>
</web-app>

As you’ll see, this keeps configuration details in code, rather than XML. The web.xml remains simple. That brings us to the GuiceServletConfig, which is where we initialize the injector.

You may recall that listeners receive event notifications at well defined points in the life cycle of a web application or session. Here’s the MyGuiceServletConfig that’s referenced in web.xml:

package com.danielwatrous.myapp.web;
 
import com.google.inject.Guice;
import com.google.inject.Injector;
import com.google.inject.servlet.GuiceServletContextListener;
import com.danielwatrous.myapp.modules.MongoMyappModule;
 
public class MyGuiceServletConfig extends GuiceServletContextListener {
 
    @Override
    protected Injector getInjector() {
        return Guice.createInjector(new MyappServletModule(), new MongoMyappModule());
    }
 
}

The GuiceServletContextListener implements ServletContextListener, which ends up executing when the application context is created (or destroyed). This way we have the injector available before executing any application code.

Another thing you may notice is that I create the injector with two modules, not just one. The first module is the ServletModule, and I’ll show that to you in a minute. The next module is the application tier module that has been included as a jar in the Wicket application. The single injector available throughout the Wicket application will be able to inject servlet/Wicket related components in addition to application tier components.

Let’s have a closer look at MyappServletModule:

package com.danielwatrous.myapp.web;
 
import com.google.inject.Inject;
import com.google.inject.Provider;
import com.google.inject.Singleton;
import com.google.inject.servlet.ServletModule;
import java.util.HashMap;
import java.util.Map;
import org.apache.wicket.protocol.http.IWebApplicationFactory;
import org.apache.wicket.protocol.http.WebApplication;
import org.apache.wicket.protocol.http.WicketFilter;
 
public class MyappServletModule extends ServletModule {
    @Override
    protected void configureServlets() {
        filter("/*").through(WicketFilter.class, createWicketFilterInitParams());
        bind(WebApplication.class).to(WicketApplication.class);
        bind(WicketFilter.class).to(CustomWicketFilter.class).in(Scopes.SINGLETON);
    }
 
    @Singleton
    private static class CustomWicketFilter extends WicketFilter {
 
        @Inject
        private Provider<WebApplication> webApplicationProvider;
 
        @Override
        protected IWebApplicationFactory getApplicationFactory() {
            return new IWebApplicationFactory() {
                @Override
                public WebApplication createApplication(WicketFilter filter) {
                    return webApplicationProvider.get();
                }
 
                @Override
                public void destroy(WicketFilter filter) {
                }
            };
        }
    }
 
    private Map<String, String> createWicketFilterInitParams() {
        Map<String, String> wicketFilterParams = new HashMap<String, String>();
        wicketFilterParams.put(WicketFilter.FILTER_MAPPING_PARAM, "/*");
        wicketFilterParams.put("applicationClassName", "com.danielwatrous.myapp.web.WicketApplication");
        return wicketFilterParams;
    }
}

You may notice that I added a mechanism to provide WicketFilter with additional parameters. Next I bind my WebApplication and WicketFilter classes to specific implementations. The CustomWicketFilter overrides the typical behavior of the WicketFilter which usually takes a string reference to the WebApplication class. Instead it now uses the injected WebApplication object.

As you’ll see below, this step of injecting the desired WebApplication is critical to enabling unittests, primarily because it allows us to construct the WebApplication with an injector.

package com.danielwatrous.myapp.web;
 
import com.google.inject.Inject;
import com.google.inject.Injector;
import org.apache.wicket.guice.GuiceComponentInjector;
import org.apache.wicket.protocol.http.WebApplication;
 
public class WicketApplication extends WebApplication {    	
    private final Injector injector;
 
    @Inject
    public WicketApplication(Injector injector) {
        this.injector = injector;
    }
 
    @Override
    public Class<HomePage> getHomePage() {
        return HomePage.class;
    }
 
    @Override
    public void init() {
        super.init();
        getComponentInstantiationListeners().add(new GuiceComponentInjector(this, injector));
    }
}

At this point Wicket and Guice are successfully integrated. Let’s have a look at what needs to happen to make the unittests work.

Unittests

The only real change that’s required to make the unittests work is in the setUp function of the unittest. Since the WebApplication above was modified to receive an injector, all we need to do is create an injector and provide it at instantiation.

package com.danielwatrous.myapp;
 
import com.google.inject.Guice;
import com.google.inject.Injector;
import com.danielwatrous.myapp.modules.TestMyappModule;
import com.danielwatrous.myapp.web.HomePage;
import com.danielwatrous.myapp.web.WicketApplication;
import org.apache.wicket.util.tester.WicketTester;
import org.junit.Before;
import org.junit.Test;
 
public class TestHomePage {
 
    private WicketTester tester;
 
    @Before
    public void setUp() {
        Injector injector = Guice.createInjector(new TestMyappModule());
        tester = new WicketTester(new WicketApplication(injector));
    }
 
    @Test
    public void homepageRendersSuccessfully() {
        //start and render the test page
        tester.startPage(HomePage.class);
 
        //assert rendered page class
        tester.assertRenderedPage(HomePage.class);
    }
}

You’ll notice in this case I create an injector that doesn’t include the ServletModule and uses the TestMyappModule. Since the Wicket unittests don’t operate within a full web context I don’t need the ServletModule. Additionally my TestMayappModule makes use of my mock package. This allows the tests to run without access to any external resources. It also keeps the tests very fast!

Wicket Pages

Accessing the injector in your Wicket pages is easy. All you need to do is inject it. Here’s how that looks:

package com.danielwatrous.myapp.web;
 
import com.google.inject.Inject;
import com.google.inject.Injector;
import com.danielwatrous.myapp.domain.QuickLink;
import org.apache.wicket.markup.html.WebPage;
import org.apache.wicket.markup.html.basic.Label;
import org.apache.wicket.request.mapper.parameter.PageParameters;
 
public class HomePage extends WebPage {
    private static final long serialVersionUID = 1L;
    @Inject private Injector injector;
 
    public HomePage(final PageParameters parameters) {
	super(parameters);
 
	add(new Label("version", getApplication().getFrameworkSettings().getVersion()));
 
        // TODO Add your page's components here
        QuickLink quickLink = injector.getInstance(QuickLink.class);
	add(new Label("quickLink ", quickLink.buildQuickLink()));
    }
}

Great Combination

After working through the particulars, this implementation feels clean and flexible. Configuration is in the code and benefits from compile time type checking. Unittests are working and fast with very little modification to the traditional Wicket approach, which keeps the application testable.

Resources:

http://apache-wicket.1842946.n4.nabble.com/Wicket-Guice-unittests-td4652853.html
https://gist.github.com/3880246

Software Engineering

Redis as a cache for Java servlets

I’ve been refactoring an application recently to move away from a proprietary and inflexible in memory datastore. The drawbacks of the proprietary datastore included the fact that the content was static. The only way to update data involved a build and replication process that took much longer than the stakeholders were willing to wait. The main selling point in favor of the in memory datastore was that it is blazing fast. And I mean blazing fast.

My choice for a replacement datastore technology is MongoDB. MongoDB worked great, but the profiling and performance comparison naturally showed that the in memory solution out performed the MongoDB solution. Communication with MongoDB for every request was obviously much slower than the previous in memory datastore solution, and the response time was less consistent from one request to another.

Caching for performance

When the data being used to generate a response changes infrequently, it’s generally bad design to serve dynamic content on every page load. Enter caching. There are a host of caching approaches covering everything from reverse proxies, like varnish, to platform specific solutions, like EHCache.

As a first stab, I chose a golden oldie, memcached, and an up and coming alternative, redis. There’s some lively discussion online about the performance differences between these two technologies. Ultimately I chose Redis due to the active development on the platform and the feature set.

Basic cache

In Java there are a handful of available Redis drivers. I started with the Jedis client. In order to use Jedis, I added this to my pom.xml.

<dependency>
    <groupId>redis.clients</groupId>
    <artifactId>jedis</artifactId>
    <version>2.0.0</version>
    <type>jar</type>
    <scope>compile</scope>
</dependency>

I then modified my basic Servlet to init a JedisPool and use jedis to cache the values I was retrieving from MongoDB. Here’s what my class ended up looking like.

package com.danielwatrous.cachetest;
 
import com.google.inject.Guice;
import com.google.inject.Injector;
import com.danielwatrous.linker.domain.WebLink;
import com.danielwatrous.linker.modules.MongoLinkerModule;
import java.io.IOException;
import java.io.PrintWriter;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import redis.clients.jedis.Jedis;
import redis.clients.jedis.JedisPool;
import redis.clients.jedis.JedisPoolConfig;
 
public class BuildCnavLink extends HttpServlet {
 
    private static Injector hbinjector = null;
    private static JedisPool pool = null;
 
    @Override
    public void init() {
        hbinjector = Guice.createInjector(new MongoLinkerModule());
        pool = new JedisPool(new JedisPoolConfig(), "localhost", 6379);
    }
 
    @Override
    public void destroy() {
        pool.destroy();
    }
 
    protected void processRequest(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {
        response.setContentType("text/xml;charset=UTF-8");
        PrintWriter out = response.getWriter();
        String value = "";
        Jedis jedis = null;
 
        try {
            jedis = pool.getResource();
            String cacheKey = getCacheKey (request.getParameter("country"), request.getParameter("lang"), request.getParameter("company"));
            value = jedis.get(cacheKey);
            if (value == null) {
                WebLink webLink = hbinjector.getInstance(WebLink.class);
                webLink.setLanguage(request.getParameter("lang"));
                webLink.setCountry(request.getParameter("country"));
                webLink.setCompany(request.getParameter("company"));
                value = webLink.buildWebLink();
                jedis.set(cacheKey, value);
            }
        } finally {
            pool.returnResource(jedis);            
        }
 
        try {
            out.println("<link>");
            out.println("<url>" + value + "</url>");
            out.println("</link>");
        } finally {            
            out.close();
        }
    }
 
    @Override
    protected void doGet(HttpServletRequest request, HttpServletResponse response)
            throws ServletException, IOException {
        processRequest(request, response);
    }
 
    protected String getCacheKey (String country, String lang, String company) {
        String cacheKey = country + lang + company;
        return cacheKey;
    }
}

Observations

It’s assumed that a combination of country, lang and company will produce a unique value when buildWebLink is called. That must be the case if you’re using those to generate a cache key.

There’s also nothing built in above to invalidate the cache. In order to validate the cache it may work to build a time/age check. There may be other more sophisticated optimistic or pessimistic algorithms to manage cached content.

In the case above, I’m using redis to store a simple String value. I’m also still generating a dynamic response, but I’ve effectively moved the majority of my data calls to redis.

Conclusion

As a first stab, this performs on par with the proprietary in memory solution that we’re replacing and the consistency from one request to another is very tight. Here I’m connecting to a local redis instance. If redis were on a remote box, network latency may erase these gains. Object storage or serialization may also affect performance if it’s determined that simple String caching isn’t sufficient or desirable.

Resources

http://www.ibm.com/developerworks/java/library/j-javadev2-22/index.html
https://github.com/xetorthio/jedis/wiki/Getting-started

Software Engineering

RESTful Java Servlet: Serializing to/from JSON with Jackson

In a previous article I demonstrated one way to create a RESTful interface using a plain Java Servlet. In this article I wanted to extend that to include JSON serialization using Jackson.

I found a very simple article showing a basic case mapping a POJO to JSON and back again. However, when I copied this straight over I got the following error:

org.codehaus.jackson.map.JsonMappingException: No serializer found for class DataClass and no properties discovered to create BeanSerializer (to avoid exception, disable SerializationConfig.Feature.FAIL_ON_EMPTY_BEANS)

I found there were two ways to get past that error. The first was to use the Jackson annotations to define properties more directly. The other was to add getters and setters to the class. Here is my DataClass with both configurations.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import java.util.ArrayList;
import java.util.List;
 
import org.codehaus.jackson.annotate.JsonAutoDetect;
import org.codehaus.jackson.annotate.JsonIgnoreProperties;
import org.codehaus.jackson.annotate.JsonProperty;
 
@JsonAutoDetect   // use this annotation if you don't have getters and setters for each JsonProperty
//@JsonIgnoreProperties(ignoreUnknown = true)    // use this if there isn't an exact correlation between JSON and class properties
public class DataClass {
 
  //@JsonProperty("theid") // use this if the property in JSON has a different identifier than your class
  @JsonProperty
  private int id = 1;
 
  @JsonProperty
  private String name = "Test Class";
 
  @JsonProperty
  private List<String> messages = new ArrayList<String>() {
    {
      add("msg 1");
      add("msg 2");
      add("msg 3");
    }
  };
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
import java.util.ArrayList;
import java.util.List;
 
public class DataClass {
 
  private int id = 1;
 
  private String name = "Test Class";
 
  private List<String> messages = new ArrayList<String>() {
    {
      add("msg 1");
      add("msg 2");
      add("msg 3");
    }
  };
 
  public String toString() {
    return "User [id=" + id + ", name=" + name + ", " + "messages=" + messages + "]";
  }
  public int getId() {
    return id;
  }
  public void setId(int id) {
    this.id = id;
  }
  public String getName() {
    return name;
  }
  public void setName(String name) {
    this.name = name;
  }
  public List<String> getMessages() {
    return messages;
  }
  public void setMessages(List<String> messages) {
    this.messages = messages;
  }
}

We are now free to modify the doGet and doPost methods in our original servlet to serialize and deserialize the DataClass to and from JSON. Here’s the code for that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
...
  protected void doGet(HttpServletRequest request, HttpServletResponse response)
      throws ServletException, IOException {
    PrintWriter out = response.getWriter();
 
    DataClass mydata = new DataClass();
    ObjectMapper mapper = new ObjectMapper();
 
    try {
      // display to console
      out.println(mapper.writeValueAsString(mydata));
    } catch (JsonGenerationException e) {
      e.printStackTrace();
    } catch (JsonMappingException e) {
      e.printStackTrace();
    } catch (IOException e) {
      e.printStackTrace();
    }
    out.close();
  }
 
  protected void doPost(HttpServletRequest request, HttpServletResponse response)
      throws ServletException, IOException {
    PrintWriter out = response.getWriter();
 
    ObjectMapper mapper = new ObjectMapper();
 
    try {
      // read from file, convert it to user class
      DataClass user = mapper.readValue(request.getReader(), DataClass.class);
      // display to console
      out.println(user);
    } catch (JsonGenerationException e) {
      e.printStackTrace();
    } catch (JsonMappingException e) {
      e.printStackTrace();
    } catch (IOException e) {
      e.printStackTrace();
    }
    out.close();
  }
...

Now you can easily serialize data to and from JSON using Jackson and POJOs without the need for a mapping file. There are even convenient annotations available that allow you to accommodate differences between the JSON and POJO properties.

Software Engineering

RESTful Java Servlet

For a recent project I found that a RESTful interface would be appropriate. My first inclination was to use Jersey (or one of the JAX-RS implementations available). The environment where this new REST API would deploy is still using Java 1.5. This became a major roadblock when I was found that none of the JAX-RS implementations provide support for the Java 1.5 virtual machine. This is not surprising since it’s few YEARS past EOSL (end of support life) for Java 1.5, but disappointing still the same.

After spending a day or so with the available frameworks, trying to get one of them to work in Java 1.5, I grew nervous that even if I did succeed, that puts my implementation squarely in a non-support region, even on the mailing lists. So I decided to see how close I could come to a simple, reliable RESTful interface using a plain old Java servlet (POJS?).

While I’m disappointed that I’m unable to leverage an existing, mature framework, I was happy with the outcome. Here are the details

HTTP Methods built in

To begin with, the servlet specification defines methods for the HTTP actions: GET, POST, PUT, DELETE, OPTIONS and HEAD.

Each of these methods receives a HttpServletRequest and HttpServletResponse object, which makes it easy to access the payload of the request and construct a response. Manipulation of return headers, payload content, response codes (200, 404, etc.) are all available directly through the HttpServletResponse object.

The only imports required to accomplish this were available in the standard Java environment, which relieved me of the need to worry about interoperability.

1
2
3
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

URI mapping

The next challenge was how to map RESTful URIs on to the methods in my servlet. At first glance, the conventional mapping done in the web.xml file wouldn’t be sufficient. The problem was with the extra parameters that were embedded in the URI. This goes just beyond what the mapping delineates in web.xml.

What I found was that a wildcard mapping and use of HttpServletRequest.getPathInfo() gave me all the information that I needed to complete the mapping. This is what my web.xml file looks like

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<?xml version="1.0" encoding="UTF-8"?>
<web-app ...>
	<display-name>testrestservlet</display-name>
	<servlet>
		<description></description>
		<display-name>TestRestServlet</display-name>
		<servlet-name>TestRestServlet</servlet-name>
		<servlet-class>com.danielwatrous.testrestservlet.TestRestServlet</servlet-class>
	</servlet>
	<servlet-mapping>
		<servlet-name>TestRestServlet</servlet-name>
		<url-pattern>/api/v1/*</url-pattern>
	</servlet-mapping>
</web-app>

The call I mentioned above, getPathInfo(), returns a string with everything that is in place of the asterisk (*) in my url-pattern above. This means that a URI of the form http://server/testrestservlet/api/vi/resource/id would return a string of “/resource/id” when getPathInfo() is called.

Avoid String Handling

I wanted to avoid messy string manipulation or analysis to parse out the details of this resource identification information. This ruled out any string splitting and checking for indexOf or startsWith.

Instead I wanted to be deliberate and reduce the possibility that a mis-mapping would slip through or that a future developer would misunderstand the exact nature of what to expect. For this reason I chose to use regular expressions. This provides a clear pattern of what my URI should look like, and it will be obvious to other developers. It also reduces the chances of a bad URI resulting in an inconsistent result.

I always use Kodos to develop regular expressions. After I got the regular expressions worked out for each class, I created an inner class inside the servlet that would help me map the URI to a specific request and give me access to the parameters embedded in the request URI. Here’s what that inner class looks like.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
  private class RestRequest {
    // Accommodate two requests, one for all resources, another for a specific resource
    private Pattern regExAllPattern = Pattern.compile("/resource");
    private Pattern regExIdPattern = Pattern.compile("/resource/([0-9]*)");
 
    private Integer id;
 
    public RestRequest(String pathInfo) throws ServletException {
      // regex parse pathInfo
      Matcher matcher;
 
      // Check for ID case first, since the All pattern would also match
      matcher = regExIdPattern.matcher(pathInfo);
      if (matcher.find()) {
        id = Integer.parseInt(matcher.group(1));
        return;
      }
 
      matcher = regExAllPattern.matcher(pathInfo);
      if (matcher.find()) return;
 
      throw new ServletException("Invalid URI");
    }
 
    public Integer getId() {
      return id;
    }
 
    public void setId(Integer id) {
      this.id = id;
    }
  }

Now I’m set to override the methods for each of my HTTP actions.

The Servlet

At this point, creating the servlet is trivial. The mapping provides all path (resource) information to the servlet and our inner class is responsible to determining whether the URI makes sense for our application and to make that information available in a sensible way for processing of the request in the action methods. Here’s what I came up with:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
package com.danielwatrous.testrestservlet;
 
import java.io.IOException;
import java.io.PrintWriter;
 
import java.util.regex.Pattern;
import java.util.regex.Matcher;
 
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
 
public class TestRestServlet extends HttpServlet {
 
  private class RestRequest {
    // Accommodate two requests, one for all resources, another for a specific resource
    private Pattern regExAllPattern = Pattern.compile("/resource");
    private Pattern regExIdPattern = Pattern.compile("/resource/([0-9]*)");
 
    private Integer id;
 
    public RestRequest(String pathInfo) throws ServletException {
      // regex parse pathInfo
      Matcher matcher;
 
      // Check for ID case first, since the All pattern would also match
      matcher = regExIdPattern.matcher(pathInfo);
      if (matcher.find()) {
        id = Integer.parseInt(matcher.group(1));
        return;
      }
 
      matcher = regExAllPattern.matcher(pathInfo);
      if (matcher.find()) return;
 
      throw new ServletException("Invalid URI");
    }
 
    public Integer getId() {
      return id;
    }
 
    public void setId(Integer id) {
      this.id = id;
    }
  }
 
  protected void doGet(HttpServletRequest request, HttpServletResponse response)
      throws ServletException, IOException {
    PrintWriter out = response.getWriter();
 
    out.println("GET request handling");
    out.println(request.getPathInfo());
    out.println(request.getParameterMap());
    try {
      RestRequest resourceValues = new RestRequest(request.getPathInfo());
      out.println(resourceValues.getId());
    } catch (ServletException e) {
      response.setStatus(400);
      response.resetBuffer();
      e.printStackTrace();
      out.println(e.toString());
    }
    out.close();
  }
 
  // implement remaining HTTP actions here
  ... 
 
}

Conclusion

This turned out to be a clear enough way to capture REST like URIs and map those onto a servlet for processing. Regular expressions provide for reliable and clear access to resource details and should be easy for future developers to quickly understand and extend.

In my next article, I’ll show you how I extend this servlet using the Jackson JSON library and a helper object to serialize and deserialize JSON payloads in RESTful requests.

Software Engineering

Wicket + GAE automatic reload

One disappointment of developing for Wicket and Google App Engine (GAE) is that the automatic monitoring and reloading of modified HTML files didn’t work. It had something to do with the single threaded nature of the GAE platform.

I had found a few previous efforts to make this work, but none of them worked with the current version of Wicket and GAE. I went without it for a while, but restarting the web server after every markup change finally drove me to figure it out.

Working with the project that I setup using my Wicket + GAE tutorial, I added two new files and modified the WicketApplication. Here are the details.

GaeModificationWatcher.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
package com.danielwatrous.softwarelicensing.web;
 
import java.util.HashSet;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;
import java.util.Map.Entry;
 
import org.apache.wicket.util.listener.IChangeListener;
import org.apache.wicket.util.time.Duration;
import org.apache.wicket.util.time.Time;
import org.apache.wicket.util.watch.IModifiable;
import org.apache.wicket.util.watch.IModificationWatcher;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
 
public class GaeModificationWatcher implements IModificationWatcher {
 
	private static final Logger LOG = (Logger) LoggerFactory
			.getLogger(GaeModificationWatcher.class);
 
	ConcurrentHashMap<IModifiable, Set<IChangeListener>> listenersMap = new ConcurrentHashMap<IModifiable, Set<IChangeListener>>();
	Duration pollFrequency;
	Time lastCheckTime;
	Object timeCheckLock = new Object();
 
	public boolean add(IModifiable modifiable, IChangeListener listener) {
		checkResources();
		HashSet<IChangeListener> listenerSet = new HashSet<IChangeListener>();
		Set<IChangeListener> listeners = listenersMap.putIfAbsent(modifiable,
				listenerSet);
		if (listeners != null) {
			return listeners.add(listener);
		} else
			return listenerSet.add(listener);
	}
 
	public IModifiable remove(IModifiable modifiable) {
		if (listenersMap.remove(modifiable) != null) {
			return modifiable;
		} else {
			return null;
		}
	}
 
	public void start(Duration pollFrequency) {
		LOG.debug("Starting watcher");
		synchronized (timeCheckLock) {
			lastCheckTime = Time.now();
			this.pollFrequency = pollFrequency;
		}
	}
 
	public void destroy() {
		// do nothing
	}
 
	public Set<IModifiable> getEntries() {
		return listenersMap.keySet();
	}
 
	public void checkResources() {
		Time now = Time.now();
 
		Time timeCheck;
		synchronized (timeCheckLock) {
			if (lastCheckTime == null) {
				return; // not started
			}
 
			Time nextTimeCheck = lastCheckTime.add(pollFrequency);
			if (nextTimeCheck.after(now)) {
				return; // nothing to do, not ready
			}
 
			// lets go
			timeCheck = this.lastCheckTime;
			this.lastCheckTime = now;
		}
 
		Set<Entry<IModifiable, Set<IChangeListener>>> entrySet = new HashSet<Entry<IModifiable, Set<IChangeListener>>>(
				listenersMap.entrySet());
 
		for (Entry<IModifiable, Set<IChangeListener>> entry : entrySet) {
			if (timeCheck.before(entry.getKey().lastModifiedTime())) {
				LOG.debug("Found modification, notifying listeners of change");
				for (IChangeListener listener : entry.getValue()) {
					listener.onChange();
				}
			}
		}
	}
}

GaeReloadRequestCycleListener.java

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
package com.danielwatrous.softwarelicensing.web;
 
import org.apache.wicket.Application;
import org.apache.wicket.RuntimeConfigurationType;
import org.apache.wicket.request.cycle.AbstractRequestCycleListener;
import org.apache.wicket.request.cycle.RequestCycle;
 
public class GaeReloadRequestCycleListener extends AbstractRequestCycleListener {
 
	public void onBeginRequest(RequestCycle cycle) {
		if (Application.get().getConfigurationType().equals(RuntimeConfigurationType.DEVELOPMENT)) {
			final GaeModificationWatcher resourceWatcher = (GaeModificationWatcher) Application.get()
					.getResourceSettings().getResourceWatcher(true);
			resourceWatcher.checkResources();
		}	
	}
}

WicketApplication.java

I haven’t provided all the details for this class, but this should show you how to implement it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
public class WicketApplication extends WebApplication
{    	
	...
 
	/**
	 * @see org.apache.wicket.Application#init()
	 */
	@Override
	public void init()
	{
		super.init();
 
		...
 
		// add your configuration here
		getRequestCycleListeners().add(new GaeReloadRequestCycleListener());
		IModificationWatcher watcher = new GaeModificationWatcher();
		watcher.start(Duration.ONE_SECOND);
		getResourceSettings().setResourceWatcher(watcher);
	}
}

Resources

http://agilewombat.blogspot.com/2010/01/wicket-on-google-app-engine.html
http://apache-wicket.1842946.n4.nabble.com/How-can-I-reload-HTML-in-app-engine-td3005241.html
http://apache-wicket.1842946.n4.nabble.com/Reload-html-in-Wicket-GAE-td4363236.html