Daniel Watrous on Software Engineering

A Collection of Software Problems and Solutions

Posts tagged security

Software Engineering

JWT based authentication in Python bottle

May applications require authentication to secure protected resources. While standards like oAuth accommodate sharing resources between applications, more variance exists in implementations of securing the app in the first place. A recent standard, JWT, provides a mechanism for creating tokens with embedded data, signing these tokens and even encrypting them when warranted.

This post explores how individual resource functions can be protected using JWT. The solution involves first creating a function decorator to perform the authentication step. Each protected resource call is then decorated with the authentication function and subsequent authorization can be performed against the data in the JWT. Let’s first look at the decorator.

jwtsecret = config.authentication.get('jwtsecret')
 
class AuthorizationError(Exception):
    """ A base class for exceptions used by bottle. """
    pass
 
def jwt_token_from_header():
    auth = bottle.request.headers.get('Authorization', None)
    if not auth:
        raise AuthorizationError({'code': 'authorization_header_missing', 'description': 'Authorization header is expected'})
 
    parts = auth.split()
 
    if parts[0].lower() != 'bearer':
        raise AuthorizationError({'code': 'invalid_header', 'description': 'Authorization header must start with Bearer'})
    elif len(parts) == 1:
        raise AuthorizationError({'code': 'invalid_header', 'description': 'Token not found'})
    elif len(parts) > 2:
        raise AuthorizationError({'code': 'invalid_header', 'description': 'Authorization header must be Bearer + \s + token'})
 
    return parts[1]
 
def requires_auth(f):
    """Provides JWT based authentication for any decorated function assuming credentials available in an "Authorization" header"""
    def decorated(*args, **kwargs):
        try:
            token = jwt_token_from_header()
        except AuthorizationError, reason:
            bottle.abort(400, reason.message)
 
        try:
            token_decoded = jwt.decode(token, jwtsecret)    # throw away value
        except jwt.ExpiredSignature:
            bottle.abort(401, {'code': 'token_expired', 'description': 'token is expired'})
        except jwt.DecodeError, message:
            bottle.abort(401, {'code': 'token_invalid', 'description': message.message})
 
        return f(*args, **kwargs)
 
    return decorated

In the above code the requires_auth(f) function makes use of a helper function to verify that there is an Authorization header and that it appears to contain the expected token. A custom exception is used to indicate a failure to identify a token in the header.

The requires_auth function then uses the python JWT library to decode the key based on a secret value jwtsecret. The secret is obtained from a config object. Assuming the JWT decodes and is not expired, the decorated function will then be called.

Authenticate

The following function can be use to generate a new JWT.

jwtexpireoffset = config.authentication.get('jwtexpireoffset')
jwtalgorithm = config.authentication.get('jwtalgorithm')
 
def build_profile(credentials):
    return {'user': credentials['user'],
            'role1': credentials['role1'],
            'role2': credentials['role2'],
            'exp': time.time()+jwtexpireoffset}
 
bottle.post('/authenticate')
def authenticate():
    # extract credentials from the request
    credentials = bottle.request.json
    if not credentials or 'user' not in credentials or 'password' not in credentials:
        bottle.abort(400, 'Missing or bad credentials')
 
    # authenticate against some identity source, such as LDAP or a database
    try:
        # query database for username and confirm password
        # or send a query to LDAP or oAuth
    except Exception, error_message:
        logging.exception("Authentication failure")
        bottle.abort(403, 'Authentication failed for %s: %s' % (credentials['user'], error_message))
 
    credentials['role1'] = is_authorized_role1(credentials['user'])
    credentials['role2'] = is_authorized_role2(credentials['user'])
    token = jwt.encode(build_profile(credentials), jwtsecret, algorithm=jwtalgorithm)
 
    logging.info('Authentication successful for %s' % (credentials['user']))
    return {'token': token}

Notice that two additional values are stored in the global configuration, jwtalgorithm and jwtexpireoffset. These are used along with jwtsecret to encode the JWT token. The actual verification of user credentials can happen in many ways, including direct access to a datastore, LDAP, oAuth, etc. After authenticating credentials, it’s easy to authorize a user based on roles. These could be implemented as separate functions and could confirm role based access based on LDAP group membership, database records, oAuth scopes, etc. While the role level access shown above looks binary, it could easily be more granular. Since a JWT is based on JSON, the JWT payload is represented as a JSON serializable python dictionary. Finally the token is returned.

Protected resources

At this point, any protected resource can be decorated, as shown below.

def get_jwt_credentials():
    # get and decode the current token
    token = jwt_token_from_header()
    credentials = jwt.decode(token, jwtsecret)
    return credentials
 
@appv1.get('/protected/resource')
@requires_auth
def get_protected_resource():
    # get user details from JWT
    authenticated_user = get_jwt_credentials()
 
    # get protected resource
    try:
        return {'resource': somedao.find_protected_resource_by_username(authenticated_user['username'])}
    except Exception, e:
        logging.exception("Resource not found")
        bottle.abort(404, 'No resource for username %s was found.' % authenticated_user['username'])

The function get_protected_resource will only be executed if requires_auth successfully validates a JWT in the header of the request. The function get_jwt_credentials will actually retrieve the JWT payload to be used in the function. While I don’t show an implementation of somedao, it is simply an encapsulation point to facilitate access to resources.

Since the JWT expires (optionally, but a good idea), it’s necessary to build in some way to extend the ‘session’. For this a refresh endpoint can be provided as follows.

bottle.post('/authenticate/refresh')
@requires_auth
def refresh_token():
    """refresh the current JWT"""
    # get and decode the current token
    token = jwt_token_from_header()
    payload = jwt.decode(token, jwtsecret)
    # create a new token with a new exp time
    token = jwt.encode(build_profile(payload), jwtsecret, algorithm=jwtalgorithm)
 
    return {'token': token}

This simply repackages the same payload with a new expiration time.

Improvements

The need to explicitly refresh the JWT increases (possibly double) the number of requests made to an API only for the purpose of extending session life. This is inefficient and can lead to awkward UI design. If possible, it would be convenient to refactor requires_auth to perform the JWT refresh and add the new JWT to the header of the request that is about to be processed. The UI could then grab the updated JWT that is produced with each request to use for the subsequent request.

The design above will actually decode the JWT twice for any resource function that requires access to the JWT payload. If possible, it would be better to find some way to inject the JWT payload into the decorated function. Ideally this would be done in a way that functions which don’t need the JWT payload aren’t required to add it to their contract.

The authenticate function could be modified to return the JWT as a header, rather than in the body of the request. This may decrease the chances of a JWT being cached or logged. It would also simplify the UI if the authenticate and refresh functions both return the JWT in the same manner.

Extending

This same implementation could be reproduced in any language or framework. The basic steps are

  1. Perform (role based) authentication and authorization against some identity resource
  2. Generate a token (like JWT) indicating success and optionally containing information about the authenticated user
  3. Transmit the token and refreshed tokens in HTTP Authorization headers, both for authenticate and resource requests

Security and Risks

At the heart of JWT security is the secret used to sign the JWT. If this secret is too simple, or if it is leaked, it would be possible for a third party to craft a JWT with any desired payload, and trick an application into delivering protected resources to an attacker. It is important to choose strong secrets and to rotate them frequently. It would also be wise to perform additional validity steps. These might include tracking how many sessions a user has, where those session have originated and the nature and frequency/speed of requests. These additional measures could prevent attacks in the event that a JWT secret was discovered and may indicate a need to rotate a secret.

Microservices

In microservice environments, it is appealing to authenticate once and access multiple microservice endpoints using the same JWT. Since a JWT is stateless, each microservice only needs the JWT secret in order to validate the signature. This potentially increases the attach surface for a hacker who wants the JWT secret.

Software Engineering

Use oAuth to Register Users on My Site using Social Media Credentials

I’m interested in allowing a user to register on my site/app using their social account credentials (e.g. Google, Facebook, LinkedIn, etc.). It should also be possible to register using an email address. Since the site/app will be composed of a handful of microservices, I would want to provide my own identity service, which might includes profile information and roles. This should be possible with oAuth.

I found plenty of examples of how to use oAuth against someone’s social accounts. What I didn’t find were any examples of how to manage user registration and possibly ongoing authentication against a social account. I also didn’t see an examples of how to mix a social oAuth server with an internal oAuth server. The internal oAuth server would provide authentication for each microservice consumed by the site/app. It seemed awkward to keep validating access tokens against the social account oAuth server for each request to local microservices, so this design uses the social access token to get an access token against the internal oAuth server. Here’s how that looks:

social-oauth-register-third-party (You can play with this diagram here)

As you can see, the access token is used to get the initial data to create a user (register) in the internal oAuth server. After registration, the user can still authenticate using their social account, but the account wouldn’t be created a second time. Also notice that the social access token is used to generate the authorization code and eventually the access token for the internal oAuth server, instead of going back to the user for confirmation. In other words, a valid access token from the social oAuth server presumes the user has logged in to authorize access already. The oAuth access token from the internal oAuth server is used to authentication all calls to internal microservices.

References:

http://nordicapis.com/how-to-control-user-identity-within-microservices/
http://www.bubblecode.net/en/2016/01/22/understanding-oauth2/
http://stackoverflow.com/questions/29644916/microservice-authentication-strategy

Software Engineering

The Road to PaaS

I have observed that discussions about CloudFoundry often lack accurate context. Some questions I get that indicate context is missing include:

  • What Java version does CloudFoundry support?
  • What database products/versions are available
  • How can I access the server directly?

There are a few reasons that the questions above are not relevant for CloudFoundry (or any modern PaaS environment). To understand why, it’s important to understand how we got to PaaS and where we came from.

cloudfoundry-compared-traditional

Landscape

When computers were first becoming a common requirement for the enterprise, most applications were monolithic. All applicaiton components would run on the same general purpose server. This included interface, application technology (e.g. Java, .NET and PHP) and data and file storage. Over time, these functions were distributed across different servers. The servers also began to take on characteristic differences that would accommodate the technology being run.

Today, compute has been commoditized and virtualized. Rather than thinking of compute as a physical server, built to suit a specific purpose, compute is instead viewed in discreet chunks that can be scaled horizontally. PaaS today marries an application with those chunks of compute capacity as needed and abstracts application access to services, which may or may not run on the same PaaS platform.

Contributor and Organization Dynamic

The role of contributors and organizations have changed throughout the evolution of the landscape. Early monolithic systems required technology experts who were familiar with a broad range of technologies, including system administration, programming, networking, etc. As the functions were distributed, the roles became more defined by their specializations. Webmasters, DBAs, and programmers became siloed. Some unintended conflicts complicated this more distributed architecture due in part to the fact that efficiencies in one silo did not always align with the best interests of other silos.

DevOps

As the evolution pushed toward compute as a commodity, the new found flexibility drove many frustrated technologists to reach beyond their respective silo to accomplish their design and delivery objectives. Programmers began to look at how different operating system environments and database technologies could enable them to produce results faster and more reliably. System administrators began to rethink system management in ways that abstracted hardware dependencies and decreased the complexity involved in augmenting compute capacity available to individual functions. Datastore, network, storage and other experts began a similar process of abstracting their offering. This blending of roles and new dynamic of collaboration and contribution has come to be known as DevOps.

Interoperability

Interoperability between systems and applications in the days of monolithic application development made use of many protocols. This was due in part to the fact that each monolithic system exposed it’s services in different ways. As the above progression took place, the field of available protocols normalized. RESTful interfaces over HTTP have emerged as an accepted standard and the serialization structures most common to REST are XML and JSON. This makes integration straight forward and provides for a high amount of reuse of existing services. This also makes services available to a greater diversity of devices.

Security and Isolation

One key development that made this evolution from compute as hardware to compute as a utility possible was effective isolation of compute resources on shared hardware. The first big step in this direction came in the form of virualization. Virtualized hardware made it possible to run many distinct operating systems simultaneously on the same hardware. It also significantly reduced the time to provision new server resources, since the underlying hardware was already wired and ready.

Compute as a ________

The next step in the evolution came in the form of containers. Unlike virtualization, containers made it possible to provide an isolated, configurable compute instance in much less time that consumed fewer system resources to create and manage (i.e. lightweight). This progression from compute as hardware to compute as virtual and finally to compute as a container made it realistic to literally view compute as discreet chunks that could be created and destroyed in seconds as capacity requirements changed.

Infrastructure as Code

Another important observation regarding the evolution of compute is that as the compute environment became easier to create (time to provision decreased), the process to provision changed. When a physical server required ordering, shipping, mounting, wiring, etc., it was reasonable to take a day or two to install and configure the operating system, network and related components. When that hardware was virtualized and could be provisioned in hours (or less), system administrators began to pursue more automation to accommodate the setup of these systems (e.g. ansible, puppet, chef and even Vagrant). This made it possible to think of systems as more transient. With the advent of Linux containers, the idea of infrastructure as code became even more prevalent. Time to provision is approaching zero.

A related byproduct of infrastructure defined by scripts or code was reproduceability. Whereas it was historically difficult to ensure that two systems were configured identically, the method for provisioning containers made it trivial to ensure that compute resources were identically configured. This in turn improved debugging, collaboration and accommodated versioning of operating environments.

Contextual Answers

Given that the landscape has changed so drastically, let’s look at some possible answers to the questions from the beginning of this post.

  • Q. What Java (or any language) version does CloudFoundry support?
    A. It supports any language that is defined in the scripts used to provision the container that will run the application. While it is true that some such scripts may be available by default, this doesn’t imply that the PaaS provides only that. If it’s a fit, use it. If not, create new provisioning scripts.
  • Q. What database products/versions are available?
    A. Any database product or version can be used. If the datastore services available that are associated with the PaaS by default are not sufficient, bring your own or create another application component to accommodate your needs.
  • Q. How can I access the server directly?
    A. There is no “the server” If you want to know more about the server environment, look at the script/code that is responsible for provisioning it. Even better, create a new container and play around with it. Once you get things just right, update your code so that every new container incorporates the desired changes. Every “the server” will look exactly how you define it.
Software Engineering

A Review of Docker

The most strikingly different characteristic of Docker, when compared to other deployment platforms, is the single responsibility per container Design (although some see it differently). One reason this looks so different is that many application developers view the complete software stack on which they deploy as a collection of components on a single logical server. For developers of larger applications, who already have experience deploying distributed stacks, the security and configuration complexity of Docker may feel more familiar. Docker brings a fresh approach to distributed stacks; one that may seem overly complex for developers of smaller applications to enjoy the convenience of Deploying their full stack to a single logical server.

Link to create applications

Docker does mitigate some of the complexity of a distributed stack by way of Linking. Linking is a way to connect multiple containers so that they have access to each other’s resources. Communication between linked containers happens over a private network between the two containers. Each container has a unique IP address on the private network. We’ll see later on that share volumes are a special case in Linking containers.

Statelessness and Persistence

One core concept behind Docker containers is that they are transient. They are fast and easy to start, stop and destroy. Once stopped, any resources associated with the running container are immediately returned to the system. This stateless approach can be a good fit for modern web applications, where statelessness simplifies scaling and concurrency. However, the question remains about what to do with truly persistent data, like records in a database.

Docker answers this question of persistence with Volumes. At first glance this appears to only provide persistence between running containers, but it can be configured to share data between host and container that will survive after the container exits.

It’s important to note that storing data outside the container breaches one isolation barrier and could be an attack vector back into any container that uses that data. It’s also important to understand that any data stored outside the container may require infrastructure to manage, backup, sync, etc., since Docker only manages containers.

Infrastructure Revision Control

Docker elevates infrastructure dependencies one level above system administration by encapsulating application dependencies inside a single container. This encapsulation makes it possible to maintain versioned deployment artifacts, either as Docker Buildfile or in binary form. This enables some interesting possibilities, such as testing a new server configuration or redeploying an old server configuration in minutes.

Two Ways to Build Docker Container Images

Docker provides two ways to create a new container.

  1. Buildfile
  2. Modify an existing container

Buildfile

A Buildfile is similar to a Vagrantfile. It references a base image (starting point) and a number of tasks to execute on that base image to arrive at a desired end state. For example, one could start with the Ubuntu base image and run a series of apt-get commands to install the Nginx web server and copy the default configuration. After the image is created, it can be used to create new containers that have those dependencies ready to go.

Images are containers in embryo, similar to how a class is an object in embryo.

A Buildfile can also be added to a git repository and builds can be automatically triggered whenever a change is committed against the Buildfile.

Modify an existing container

Unlike the Buildfile, which is a textfile containing commands, it is also possible to build a container from an existing image and run ‘/bin/bash’. From the bash prompt any desired changes can be made. These commands modify the actual image, which can then be committed into the DockerHub repository or stored elsewhere for later use.

In either case, the result is a binary image that can be used to create a container providing a specific dependency profile.

Scaling Docker

Docker alone doesn’t answer the question about how to scale out containers, although there are a lot of projects trying to answer that question. It’s important to know that containerizing an application doesn’t automatically make it easier to scale. It is necessary to create logic to build, monitor, link, distribute, secure, update and otherwise manage containers.

Not Small VMs

It should be obvious by this point that Docker containers are not intended to be small Virtual Machines. They are isolated, single function containers that should be responsible for a single task and linked together to provide a complete software stack. This is similar to the Single Responsibility Principle. Each container should have a single responsibility, which increases the likelihood of reuse and decreases the complexity of ongoing management.

Application Considerations

I would characterize most of the discussion above as infrastructure considerations. There are several application specific considerations to review.

PaaS Infection of Application Code

Many PaaS solutions infect application code. This may be in the form of requiring use of certain libraries, specific language versions or adhering to specific resource structures. The trade-off promise is that in exchange for the rigid application requirements, the developer enjoys greater ease and reliability when deploying and scaling an application and can largely ignore system level concerns.

The container approach that Docker takes doesn’t infect application code, but it drastically changes deployment. Docker is so flexible in fact, that it becomes possible to run different application components with different dependencies, such as differing versions of the same programming language. Application developers are free to use any dependencies that suit their needs and develop in any environment that they like, including a full stack on a single logical server. No special libraries are required.

While this sounds great, it also increases application complexity in several ways, some of which are unexpected. One is that the traditional role of system administrator must change to be more involved in application development. The management of security, patching, etc. need to happen across an undefined number of containers rather than a fixed number of servers. A related complexity is that application developers need to be more aware of system level software, security, conflicts management, etc.

While it is true that Docker containers don’t infect application code, they drastically change the application development process and blur traditional lines between application development and system administration.

Security is Complicated

Security considerations for application developers must expand to include understanding of how containers are managed and what level of system access they have. This includes understanding how Linking containers works so that communication between containers and from container to host or from container to internet can be properly secured. Management of persistent data that must survive beyond the container life cycle needs to enforce the same isolation and security that the container promises. This can become tricky in a shared environment.

Configuration is complicated

Application configuration is also complicated, especially communication between containers that are not running on a single logical server, but instead are distributed among multiple servers or even multiple datacenters. Connectivity to shared resources, such as a database or set of files becomes tricky if those are also running in containers. In order to accommodate dynamic life cycle management of containers across server and datacenter boundaries, some configuration will need to be handled outside the container. This too will require careful attention to ensure isolation and protection.

Conclusion

Docker and related containerization tools appear to be a fantastic step in the direction of providing greater developer flexibility and increased hardware utilization. The ability to version infrastructure and deploy variants in minutes is a big positive.

While the impacts on application development don’t directly impact the lines of code written, they challenge conventional roles, such as developer and system administrator. Increased complexity is introduced by creating a linked software stack where connectivity and security between containers need to be addressed, even for small applications.

Software Engineering

Install SSL Enabled MongoDB Subscriber Build

10gen offers a subscriber build of MongoDB which includes support for SSL communication between nodes in a replicaset and between client and mongod. If the cost of a service subscription is prohibitive, it is possible to build it with SSL enabled.

After download, I followed the process below to get it running. For a permanent solution, more attention should be given to where these are installed and how upgrades are handled.

$ tar xzvf mongodb-linux-x86_64-subscription-rhel62-2.2.3.tgz
$ cp mongodb-linux-x86_64-subscription-rhel62-2.2.3/bin/* /usr/local/bin/

Next, it’s necessary to provide an SSL certificate. For testing, it’s easy to create an SSL certificate.

$ cd /etc/ssl
$ openssl req -new -x509 -days 365 -nodes -out mongodb-cert.pem -keyout mongodb-cert.key -passout pass:mypass
Generating a 2048 bit RSA private key
........................+++
.....................................+++
writing new private key to 'mongodb-cert.key'
-----
You are about to be asked to enter information that will be incorporated
into your certificate request.
What you are about to enter is what is called a Distinguished Name or a DN.
There are quite a few fields but you can leave some blank
For some fields there will be a default value,
If you enter '.', the field will be left blank.
-----
Country Name (2 letter code) [XX]:US
State or Province Name (full name) []:Idaho
Locality Name (eg, city) [Default City]:Boise
Organization Name (eg, company) [Default Company Ltd]:ACME
Organizational Unit Name (eg, section) []:IT
Common Name (eg, your name or your server's hostname) []:host0123
Email Address []:myemail@mail.com

With the certificate created, make a combined pem file as follows

$ cat mongodb-cert.key mongodb-cert.pem > mongodb.pem
$ ll
total 12
lrwxrwxrwx 1 root root   16 May 10  2012 certs -> ../pki/tls/certs
-rw-r--r-- 1 root sys  1704 Feb 14 19:21 mongodb-cert.key
-rw-r--r-- 1 root sys  1395 Feb 14 19:21 mongodb-cert.pem
-rw-r--r-- 1 root sys  3099 Feb 14 19:21 mongodb.pem

Finally, you can start mongodb as follows

$ mongod -dbpath /opt/webhost/local/mongod -logpath /var/log/mongo/mongod.log -keyFile /home/mongod/mongokey --sslOnNormalPorts --sslPEMKeyFile /etc/ssl/mongodb.pem --sslPEMKeyPassword mypass --replSet wildcatset --rest --logappend &

Accessing over SSL

SSL certificate management can be complicated. It is possible to bypass certificate validation when using a self issued certificate. Python does this by default. Java may require additional work to bypass certificate validation.

Software Engineering

MongoDB Authentication Setup

Authentication in MongoDB provides ‘normal’, which is full read and write, or ‘readonly’ access at a database level. There are two scenarios when authentication comes into play: single server and multi-server. When using a single server, authentication can be enabled but adding --auth to the startup parameters.

When using a replicaset, sharded setup or combination, a key file must be provided and the --keyFile parameter used at startup. This enables each node to communicate with other nodes using a nonce scheme based on the keyFile. In this configuration, --auth is implied and the all MongoDB access then requires authentication.

The details below assume a replicaset configuration.

Creating a keyFile

The keyFile must be 1kB or less. It can be any text and for platform independence, any whitespace is ignored. On Linux, OpenSSL can be used to create a solid keyFile like this:

openssl rand -base64 258 > mongokey

At this point, the local file mongokey will contain something like this:

rMvhlWEIzktbhXN+rcTV43z2YKPGsd8YHNNuOVpZLW9bIPx1MaAeGTVullFVY4A5
B0zRpKLXcB347T/m278LK3BNBynB3mVpoe1pPmSYVjpBmo3LhsDKXywb8dU7UrBl
9bgh4NZfNaBcYykuoQsiloWNP5QtMquBymF2bh+1s+aJpvkq1FzAhsJvwcGeILBc
gnBOwZAsDXlE0M1hr0zvsulkyvFDgE2UcS+2tm4yZPKNDygA2HCcXqJypa9L2f1J
dC83SLNxbN4MkeE+NeY3ZE+LFUqTyvb827VhXfCX+S+TpD5h/otiS1GiQnTcBiSB
fYrMhLsOFPU9UYc705XDw48m

It’s important to choose a multiple of 3 (e.g. 258) so that not equal signs are added to the keyFile.

Installing keyFile

It’s important to protect this file from unauthorized access. One way to do this is to store it in a way that only the limited user mongod user has access. In my case I moved the file into a directory owned by the mongod user and set permissions restrictively.

mkdir /home/mongod
mv /home/watrous/mongokey /home/mongod/
chown -R mongod.mongod /home/mongod/
chmod -R 700 /home/mongod/

Update MongoDB configuration

With the keyFile in place and secure, I then updated the configuration file, /etc/mongo.conf, to include a reference to the keyFile by adding this line:

keyFile = /home/mongod/mongokey

MongoDB must then be restarted and will load up. After restarting you may notice chatter in the logs about failed authentication. These errors will go away as the same procedure is completed on remaining nodes and they have the keyFile available.

Establishing users

Once a keyFile has been installed as described above, MongoDB then requires authentication. Keep in mind that it is not necessary to add --auth as a startup parameter when using a keyFile.

Complete instructions for adding users can be found here: http://docs.mongodb.org/manual/tutorial/control-access-to-mongodb-with-authentication/.

I began by establishing an admin user. I did this by connecting to mongo locally on the primary node:

[watrous@system ~]$ mongo
MongoDB shell version: 2.0.6
connecting to: test
PRIMARY> use admin
switched to db admin
PRIMARY> db.addUser("admin", "bn%c@4fE0ns$!w4TFao$innIjOBKoPS*")
{
        "n" : 0,
        "lastOp" : NumberLong("5828987137181089793"),
        "connectionId" : 60,
        "err" : null,
        "ok" : 1
}
{
        "user" : "admin",
        "readOnly" : false,
        "pwd" : "f9d7f021d49ccc82b5186d16c664c652",
        "_id" : ObjectId("50e4b8eae0bdfc9063b69c32")
}
> db.auth("admin", "bn%c@4fE0ns$!w4TFao$innIjOBKoPS*")
1
PRIMARY> db.system.users.find()
{ "_id" : ObjectId("50e4b8eae0bdfc9063b69c32"), "user" : "admin", "readOnly" : false, "pwd" : "f9d7f021d49ccc82b5186d16c664c652" }

Next I established an account for a specific database. I created two accounts, one with normal access and the other with readonly access.

PRIMARY> use documents
switched to db documents
PRIMARY> db.addUser("documents_full", "*XE@D2x@nc8pfp9iKnA!!Fu!3mTd*HYY")
{
        "n" : 0,
        "lastOp" : NumberLong("5828988434261213185"),
        "connectionId" : 60,
        "err" : null,
        "ok" : 1
}
{
        "user" : "documents_full",
        "readOnly" : false,
        "pwd" : "3cd1cbaec406081b310d7f49b4284c2f",
        "_id" : ObjectId("50e4ba19e0bdfc9063b69c33")
}
PRIMARY> db.addUser("documents_readonly", "91h#Tv5prInoU%GZQDNF9AoAWN5HTEag", true)
{
        "n" : 0,
        "lastOp" : NumberLong("5828988696254218241"),
        "connectionId" : 60,
        "err" : null,
        "ok" : 1
}
{
        "user" : "documents_readonly",
        "readOnly" : true,
        "pwd" : "87cab9e7ce7a5c731b34b1a0737c2ae9",
        "_id" : ObjectId("50e4ba56e0bdfc9063b69c34")
}
PRIMARY> db.system.users.find()
{ "_id" : ObjectId("50e4ba19e0bdfc9063b69c33"), "user" : "documents_full", "readOnly" : false, "pwd" : "3cd1cbaec406081b310d7f49b4284c2f" }
{ "_id" : ObjectId("50e4ba56e0bdfc9063b69c34"), "user" : "documents_readonly", "readOnly" : true, "pwd" : "87cab9e7ce7a5c731b34b1a0737c2ae9" }

At this point I was able to verify authentication and access levels.

Software Engineering

MongoDB Secure Mode

Security in MongoDB is relatively young in terms of features and granularity. Interestingly, they indicate that a typical use case would be to use Mongo on a trusted network “much like how one would use, say, memcached.

MongoDB does NOT run in secure mode by default.

As it is, the features that are available are standard, proven and probably sufficient for most use cases. Here’s a quick summary of pros and cons.

  • Pros
    • Nonce-based digest for authentication
    • Security applies across replica set nodes and shard members
  • Cons
    • Few recent replies on security wiki page
    • Course grained access control

User access levels

Course grained access control allows for users to be defined per database and given either read only or read/write access. Since there is no rigid schema in MongoDB, it’s not possible to limit access to a subset of collections or documents.

Limit to expected IPs

Along the lines of the ‘trusted network’ mentioned above, it’s recommended to configure each mongo instance to accept connections from specific ports. For example, you could limit access to the loopback address, or to an IP for a local private network.

Disable http interface

By default, a useful HTTP based interface provides information about the mongodb instance on a machine and links to similar interfaces on related machines in the replica set. This can be disabled by providing –nohttpinterface when starting mongod.

SSL ready

In cases where SSL security is required, Mongo can be compiled to include support for it. The standard downloads do not include this feature. A standard SSL key can be produced in the usual way, using openssl for example.

Software Engineering

Software licensing: Authentication and authorization for admin pages

For simplicity and security I’ve decided to integrate with the Google Account authentication mechanism that’s built into Google App Engine. This allows anyone with a Google account to login to my application without the need to setup another account. This also gives me access to the user’s valid email in order to send messages and other communication related to the service I provide.

So far I have three separate ‘areas’ for interfacing with my service. The first area is comprised of public pages, such as the home page or privacy policy. The next area is the API where RESTful access will take place. That leaves the administration area where an account administrator will be able to view statistics, adjust licenses, etc. These are mapped as follows

http://domain/
http://domain/api/
http://domain/admin/

The API will require authentication with each call in the form of an apikey (may change to oAuth in the future). I was able to secure the admin area of the site by adding a security-constraint to the web.xml file. Here’s what that looks like.

1
2
3
4
5
6
7
8
9
10
11
<web-app ...>
	...
	<security-constraint>
	   <web-resource-collection>
	       <url-pattern>/admin/*</url-pattern>
	   </web-resource-collection>
	   <auth-constraint>
	       <role-name>*</role-name>
	   </auth-constraint>
	</security-constraint>
</web-app>

You might have noticed this mechanism is not limited to authentication only. It’s also possible to include authorization preferences by role using role-name.