Daniel Watrous on Software and Cloud Engineering – Page 9 – A collection of Software and Cloud patterns with a focus on the Enterprise

As the scale of web applications increases, performance optimization considerations are more frequently included in initial design. One optimization technique used extensively is caching. A cache contains pre-processed data that is ready to use without redoing the processing. Processing may include extensive computation and accessing data over a network or on disk. Keep the details out of your code One design consideration when introducing a caching mechanism in your code is to keep the details out of your code. Most caches are just simple key value stores, and so it’s tempting to introduce......

Continue Reading

A few days ago I wrote about how to structure version details in MongoDB. In this and subsequent articles I’m going to present a Java based approach to working with that revision data. I have published all this work as an open source repository on github. Feel free to fork it: https://github.com/dwatrous/mongodb-revision-objects Design Decisions To begin with, here are a few design rules that should direct the implementation: Program to interfaces. Choice of datastore or other technologies should not be visible in application code Application code should never deal with versioned objects. It......

Continue Reading

The need to track changes to web content and provide for draft or preview functionality is common to many web applications today. In relational databases it has long been common to accomplish this using a recursive relationship within a single table or by splitting the table out and storing version details in a secondary table. I’ve recently been exploring the best way to accomplish this using MongoDB. A few design considerations Data will be represented in three primary states, published, draft and history. These might also be called current and preview. From a......

Continue Reading

I’ve been working on generating analytics based on a collection containing statistical data. My previous attempt involved using Map Reduce in MongoDB. Recall that the data in the statistics collection has this form. { "_id" : ObjectId("5e6877a516832a9c8fe89ca9"), "apikey" : "7e78ed1525b7568c2316576f2b265f55e6848b5830db4e6586283", "request_date" : ISODate("2013-04-05T06:00:24.006Z"), "request_method" : "POST", "document" : { "domain" : "", "validationMethod" : "LICENSE_EXISTS_NOT_EXPIRED", "deleted" : null, "ipAddress" : "", "disposition" : "", "owner" : ObjectId("af1459ed793eca35754090a0"), "_id" : ObjectId("6fec518787a52a9c988ea683"), "issueDate" : ISODate("2013-04-05T06:00:24.005Z"), }, "request_uri" : { "path" : "/v1/sitelicenses", "netloc" : "api.easysoftwarelicensing.com" } }{ "_id" : ObjectId("5e6877a516832a9c8fe89ca9"), "apikey" : "7e78ed1525b7568c2316576f2b265f55e6848b5830db4e6586283", "request_date" :......

Continue Reading

I have a RESTful SaaS service I created which uses MongoDB. Each REST call creates a new record in a statistics collection. In order to implement quotas and provide user analytics, I need to process the statistics collection periodically and generate meaningful analytics specific to each user. This is just the type of problem map reduce was meant to solve. In order to accomplish this I’ll need to do the following: Map all statistics records over a time range Reduce the number of calls, both authenticated and anonymous Finalize to get the sum......

Continue Reading

10gen offers a subscriber build of MongoDB which includes support for SSL communication between nodes in a replicaset and between client and mongod. If the cost of a service subscription is prohibitive, it is possible to build it with SSL enabled. After download, I followed the process below to get it running. For a permanent solution, more attention should be given to where these are installed and how upgrades are handled. $ tar xzvf mongodb-linux-x86_64-subscription-rhel62-2.2.3.tgz $ cp mongodb-linux-x86_64-subscription-rhel62-2.2.3/bin/* /usr/local/bin/$ tar xzvf mongodb-linux-x86_64-subscription-rhel62-2.2.3.tgz $ cp mongodb-linux-x86_64-subscription-rhel62-2.2.3/bin/* /usr/local/bin/ Next, it’s necessary to provide an SSL......

Continue Reading

I’ve had several conversations recently about caching as it relates to big data. As a result of these discussions I wanted to review some details that should be considered when deciding if a cache is necessary and how to cache big data when it is necessary. What is a Cache? The purpose of a cache is to duplicate frequently accessed or important data in such a way that it can be accessed very fast and close to where it is needed. Caching generally moves data from a low cost, high density location (e.g.......

Continue Reading

Another tool for monitoring the performance and health of a MongoDB node is mongostat. You’ll recall that mongotop shows the time in milliseconds that a mongo node spent accessing (read and write) a particular collection. mongostat on the other hand provides more detailed information about the state of a mongo node, including disk usage, data throughput, index misses, locks, etc. However, the data is general to the mongo node and doesn’t indicate which database or collection the status refers to. As you would expect, both utilities, mongotop and mongostat, are required to get......

Continue Reading

In the process of tuning the performance of a MongoDB replica set, it’s useful to be able to observe mongod directly, as opposed to inferring what it’s doing by watching the output of top, for example. For that reason MongoDB comes with a utility, mongotop. The output of mongotop indicates the amount of time the mongod process spend reading and writing to a specific collection during the update interval. I used the following command to run mongotop on an authentication enabled replica set with a two second interval. [watrous@d1t0156g ~]# mongotop -p -u......

Continue Reading

MongoDB connections accommodate a ReadPreference, which in a clustered environment, like a replicaset, indicates how to select the best host for a query. One major consideration when setting the read preference is whether or not you can live with eventually consistent reads, since SECONDARY hosts may lag behind the PRIMARY. Some of the options you can choose include: PRIMARY: This will ensure the most consistency, but also concentrates all your queries on a single host. SECONDARY: This will distribute your queries among secondary nodes and may lag in consistency with the primary primaryPreferred:......

Continue Reading

Blog