Today: October 12, 2024 9:04 pm
A collection of Software and Cloud patterns with a focus on the Enterprise

Tag: bzip2


The log rotation mechanism on my servers automatically compresses (gzip) the rotated log file to save on disk space. I discovered that Hadoop is already designed to deal with compressed files using gzip, bzip2 and LZO out of the box. This means that no additional work is required in the Mapper class to decompress. Here’s a snippet from the MapReduce output that shows 13/11/15 22:01:46 INFO mapred.MapTask: Processing split: hdfs://localhost/user/watrous/log_myhost.com/catalina.out-20131103.gz:0+3058954 13/11/15 22:01:46 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer 13/11/15 22:01:46 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584) 13/11/15 22:01:46 INFO mapred.MapTask: mapreduce.task.io.sort.mb:......

Continue Reading