Today: November 16, 2025 5:27 pm
A collection of Software and Cloud patterns with a focus on the Enterprise

Tag: big data


In a previous post I illustrated the use of Hadoop to analyze Apache Tomcat log files (catalina.out). Below I perform the same Tomcat log analysis using PIG. The motivation behind PIG is the ability us a descriptive language to analyze large sets of data rather than writing code to process it, using Java or Python for example. PIG latin is the descriptive query language and has some similarities with SQL. These include grouping and filtering. Load in the data First I launch into the interactive local PIG command line, grunt. Commands are not......

Continue Reading


Big Data Cache Approaches

I’ve had several conversations recently about caching as it relates to big data. As a result of these discussions I wanted to review some details that should be considered when deciding if a cache is necessary and how to cache big data when it is necessary. What is a Cache? The purpose of a cache is to duplicate frequently accessed or important data in such a way that it can be accessed very fast and close to where it is needed. Caching generally moves data from a low cost, high density location (e.g.......

Continue Reading