A Review of Docker

August 1, 2014

Software Engineering

5 Comments

Daniel Watrous

The most strikingly different characteristic of Docker, when compared to other deployment platforms, is the single responsibility per container Design (although some see it differently). One reason this looks so different is that many application developers view the complete software stack on which they deploy as a collection of components on a single logical server. For developers of larger applications, who already have experience deploying distributed stacks, the security and configuration complexity of Docker may feel more familiar. Docker brings a fresh approach to distributed stacks; one that may seem overly complex for developers of smaller applications to enjoy the convenience of Deploying their full stack to a single logical server.

Link to create applications

Docker does mitigate some of the complexity of a distributed stack by way of Linking. Linking is a way to connect multiple containers so that they have access to each other’s resources. Communication between linked containers happens over a private network between the two containers. Each container has a unique IP address on the private network. We’ll see later on that share volumes are a special case in Linking containers.

Statelessness and Persistence

One core concept behind Docker containers is that they are transient. They are fast and easy to start, stop and destroy. Once stopped, any resources associated with the running container are immediately returned to the system. This stateless approach can be a good fit for modern web applications, where statelessness simplifies scaling and concurrency. However, the question remains about what to do with truly persistent data, like records in a database.

Docker answers this question of persistence with Volumes. At first glance this appears to only provide persistence between running containers, but it can be configured to share data between host and container that will survive after the container exits.

It’s important to note that storing data outside the container breaches one isolation barrier and could be an attack vector back into any container that uses that data. It’s also important to understand that any data stored outside the container may require infrastructure to manage, backup, sync, etc., since Docker only manages containers.

Infrastructure Revision Control

Docker elevates infrastructure dependencies one level above system administration by encapsulating application dependencies inside a single container. This encapsulation makes it possible to maintain versioned deployment artifacts, either as Docker Buildfile or in binary form. This enables some interesting possibilities, such as testing a new server configuration or redeploying an old server configuration in minutes.

Two Ways to Build Docker Container Images

Docker provides two ways to create a new container.

Buildfile
Modify an existing container

Buildfile

A Buildfile is similar to a Vagrantfile. It references a base image (starting point) and a number of tasks to execute on that base image to arrive at a desired end state. For example, one could start with the Ubuntu base image and run a series of apt-get commands to install the Nginx web server and copy the default configuration. After the image is created, it can be used to create new containers that have those dependencies ready to go.

Images are containers in embryo, similar to how a class is an object in embryo.

A Buildfile can also be added to a git repository and builds can be automatically triggered whenever a change is committed against the Buildfile.

Modify an existing container

Unlike the Buildfile, which is a textfile containing commands, it is also possible to build a container from an existing image and run ‘/bin/bash’. From the bash prompt any desired changes can be made. These commands modify the actual image, which can then be committed into the DockerHub repository or stored elsewhere for later use.

In either case, the result is a binary image that can be used to create a container providing a specific dependency profile.

Scaling Docker

Docker alone doesn’t answer the question about how to scale out containers, although there are a lot of projects trying to answer that question. It’s important to know that containerizing an application doesn’t automatically make it easier to scale. It is necessary to create logic to build, monitor, link, distribute, secure, update and otherwise manage containers.

Not Small VMs

It should be obvious by this point that Docker containers are not intended to be small Virtual Machines. They are isolated, single function containers that should be responsible for a single task and linked together to provide a complete software stack. This is similar to the Single Responsibility Principle. Each container should have a single responsibility, which increases the likelihood of reuse and decreases the complexity of ongoing management.

Application Considerations

I would characterize most of the discussion above as infrastructure considerations. There are several application specific considerations to review.

PaaS Infection of Application Code

Many PaaS solutions infect application code. This may be in the form of requiring use of certain libraries, specific language versions or adhering to specific resource structures. The trade-off promise is that in exchange for the rigid application requirements, the developer enjoys greater ease and reliability when deploying and scaling an application and can largely ignore system level concerns.

The container approach that Docker takes doesn’t infect application code, but it drastically changes deployment. Docker is so flexible in fact, that it becomes possible to run different application components with different dependencies, such as differing versions of the same programming language. Application developers are free to use any dependencies that suit their needs and develop in any environment that they like, including a full stack on a single logical server. No special libraries are required.

While this sounds great, it also increases application complexity in several ways, some of which are unexpected. One is that the traditional role of system administrator must change to be more involved in application development. The management of security, patching, etc. need to happen across an undefined number of containers rather than a fixed number of servers. A related complexity is that application developers need to be more aware of system level software, security, conflicts management, etc.

While it is true that Docker containers don’t infect application code, they drastically change the application development process and blur traditional lines between application development and system administration.

Security is Complicated

Security considerations for application developers must expand to include understanding of how containers are managed and what level of system access they have. This includes understanding how Linking containers works so that communication between containers and from container to host or from container to internet can be properly secured. Management of persistent data that must survive beyond the container life cycle needs to enforce the same isolation and security that the container promises. This can become tricky in a shared environment.

Configuration is complicated

Application configuration is also complicated, especially communication between containers that are not running on a single logical server, but instead are distributed among multiple servers or even multiple datacenters. Connectivity to shared resources, such as a database or set of files becomes tricky if those are also running in containers. In order to accommodate dynamic life cycle management of containers across server and datacenter boundaries, some configuration will need to be handled outside the container. This too will require careful attention to ensure isolation and protection.

Conclusion

Docker and related containerization tools appear to be a fantastic step in the direction of providing greater developer flexibility and increased hardware utilization. The ability to version infrastructure and deploy variants in minutes is a big positive.

While the impacts on application development don’t directly impact the lines of code written, they challenge conventional roles, such as developer and system administrator. Increased complexity is introduced by creating a linked software stack where connectivity and security between containers need to be addressed, even for small applications.

PRV POST

NXT POST

Comments

Daniel Watrous : September 3, 2014 at 10:59 am

Just found this article by Martin Fowler on microservices. He echoes some of what I say here about the requirement for increased crossover between development and system administration. http://martinfowler.com/bliki/MicroservicePrerequisites.html

Reply
The road to PaaS | Daniel Watrous on Software Engineering : November 10, 2014 at 12:11 pm

[…] chef and even Vagrant). This made it possible to think of systems as more transient. With the advent of Linux containers, the idea of infrastructure as code became even more prevalent. Time to provision is approaching […]

Reply
Use Docker to Build a LEMP Stack (Buildfile) | Daniel Watrous on Software Engineering : October 29, 2015 at 10:51 am

[…] been reviewing Docker recently. As part of that review, I decided to build a LEMP stack in Docker. I use Vagrant to […]

Reply
Infrastructure as Code | Daniel Watrous on Software Engineering : September 21, 2017 at 9:24 am

[…] than virtual servers, but questions about how to handle networking and storage remained. The docker best practice of single function containers drove up the number of instances when compared to more complex virtual servers that filled multiple […]

Reply
Infrastructure as Code – Daniel Watrous on Software Engineering : January 5, 2021 at 8:14 pm

[…] than virtual servers, but questions about how to handle networking and storage remained. The docker best practice of single function containers drove up the number of instances when compared to more complex virtual servers that filled multiple […]

Reply