I’ve been working on generating analytics based on a collection containing statistical data. My previous attempt involved using Map Reduce in MongoDB. Recall that the data in the statistics collection has this form. { "_id" : ObjectId("5e6877a516832a9c8fe89ca9"), "apikey" : "7e78ed1525b7568c2316576f2b265f55e6848b5830db4e6586283", "request_date" : ISODate("2013-04-05T06:00:24.006Z"), "request_method" : "POST", "document" : { "domain" : "", "validationMethod" : "LICENSE_EXISTS_NOT_EXPIRED", "deleted" : null, "ipAddress" : "", "disposition" : "", "owner" : ObjectId("af1459ed793eca35754090a0"), "_id" : ObjectId("6fec518787a52a9c988ea683"), "issueDate" : ISODate("2013-04-05T06:00:24.005Z"), }, "request_uri" : { "path" : "/v1/sitelicenses", "netloc" : "api.easysoftwarelicensing.com" } }{ "_id" : ObjectId("5e6877a516832a9c8fe89ca9"), "apikey" : "7e78ed1525b7568c2316576f2b265f55e6848b5830db4e6586283", "request_date" :......
Continue Reading
I have a RESTful SaaS service I created which uses MongoDB. Each REST call creates a new record in a statistics collection. In order to implement quotas and provide user analytics, I need to process the statistics collection periodically and generate meaningful analytics specific to each user. This is just the type of problem map reduce was meant to solve. In order to accomplish this I’ll need to do the following: Map all statistics records over a time range Reduce the number of calls, both authenticated and anonymous Finalize to get the sum......
Continue Reading
10gen offers a subscriber build of MongoDB which includes support for SSL communication between nodes in a replicaset and between client and mongod. If the cost of a service subscription is prohibitive, it is possible to build it with SSL enabled. After download, I followed the process below to get it running. For a permanent solution, more attention should be given to where these are installed and how upgrades are handled. $ tar xzvf mongodb-linux-x86_64-subscription-rhel62-2.2.3.tgz $ cp mongodb-linux-x86_64-subscription-rhel62-2.2.3/bin/* /usr/local/bin/$ tar xzvf mongodb-linux-x86_64-subscription-rhel62-2.2.3.tgz $ cp mongodb-linux-x86_64-subscription-rhel62-2.2.3/bin/* /usr/local/bin/ Next, it’s necessary to provide an SSL......
Continue Reading
I’ve had several conversations recently about caching as it relates to big data. As a result of these discussions I wanted to review some details that should be considered when deciding if a cache is necessary and how to cache big data when it is necessary. What is a Cache? The purpose of a cache is to duplicate frequently accessed or important data in such a way that it can be accessed very fast and close to where it is needed. Caching generally moves data from a low cost, high density location (e.g.......
Continue Reading
Another tool for monitoring the performance and health of a MongoDB node is mongostat. You’ll recall that mongotop shows the time in milliseconds that a mongo node spent accessing (read and write) a particular collection. mongostat on the other hand provides more detailed information about the state of a mongo node, including disk usage, data throughput, index misses, locks, etc. However, the data is general to the mongo node and doesn’t indicate which database or collection the status refers to. As you would expect, both utilities, mongotop and mongostat, are required to get......
Continue Reading
In the process of tuning the performance of a MongoDB replica set, it’s useful to be able to observe mongod directly, as opposed to inferring what it’s doing by watching the output of top, for example. For that reason MongoDB comes with a utility, mongotop. The output of mongotop indicates the amount of time the mongod process spend reading and writing to a specific collection during the update interval. I used the following command to run mongotop on an authentication enabled replica set with a two second interval. [watrous@d1t0156g ~]# mongotop -p -u......
Continue Reading
MongoDB connections accommodate a ReadPreference, which in a clustered environment, like a replicaset, indicates how to select the best host for a query. One major consideration when setting the read preference is whether or not you can live with eventually consistent reads, since SECONDARY hosts may lag behind the PRIMARY. Some of the options you can choose include: PRIMARY: This will ensure the most consistency, but also concentrates all your queries on a single host. SECONDARY: This will distribute your queries among secondary nodes and may lag in consistency with the primary primaryPreferred:......
Continue Reading
Authentication in MongoDB provides ‘normal’, which is full read and write, or ‘readonly’ access at a database level. There are two scenarios when authentication comes into play: single server and multi-server. When using a single server, authentication can be enabled but adding --auth to the startup parameters. When using a replicaset, sharded setup or combination, a key file must be provided and the --keyFile parameter used at startup. This enables each node to communicate with other nodes using a nonce scheme based on the keyFile. In this configuration, --auth is implied and the......
Continue Reading
I’ve been working on some HP-UX systems recently and had way too much trouble finding the solution to an issue with how vi, top and other functions displayed in the terminal. For example, when I would start vi rather than blanking the screen, it would just overwrite the lowest line in the terminal with 23y0C1A0y0C~0y0CC56C64C72C. It appeared that vi commands would work, but I couldn’t see anything that was happening. My initial attempt at setting the term failed to produce any results. The final solution is pretty easy: I changed my shell to......
Continue Reading
Last week I spent way too much time integrating Apache Wicket and Google Guice. Yikes! The most difficult part for me was getting the initialization to happen in the right order. A big Thank You to Dan Retzlaff on the Wicket list for helping work through these details. The details below were applied to a Wicket quickstart project for Wicket 6.0.0. Design Decisions It was important to me to keep the application tier separate from web tier. I actually maintain each in a separate repository. I have several motivations for this, such as:......
Continue Reading