Daniel Watrous on Software Engineering

A Collection of Software Problems and Solutions

Posts tagged docker

Software Engineering

kubernetes overview

Kubernetes is getting a lot of attention recently, and there is good reason for that. Docker containers alone are little more than a developer convenience. Orchestration moves containers from laptop into the datacenter. Kubernetes does that in a way that simplifies development and operations. Unfortunately I struggled to find easy to understand high level descriptions of how kubernetes worked, so I made the diagram below.

Operations

While I don’t show the operator specifically (usually someone in IT, or a managed offering like GKE), everything in the yellow box would be managed by the operator. The nodes host pods. The master nodes facilitate orchestration. The gateway facilitates incoming traffic. A bastion host is often used to manage members of the cluster, but isn’t shown above.

Persistent storage, in the form of block storage, databases, etc. is also not directly part of the cluster. There are kubernetes resources, like Persistent Volumes, that can be used to associate external persistent storage with pods, but the management of the storage solution is outside the scope of managing kubernetes.

Development

The Developer performs two primary activities: Create docker images and tell kubernetes to deploy them. Kubernetes is not opinionated when it comes to container image design. This means a developer can choose to manage ports, volumes, libraries, runtimes, and so on in any way that suits him. Kubernetes also has no opinion on container registry, as long as it is compliant with the docker registry v2 spec.

There are a few ways to deploy a container image to kubernetes. The diagram above shows two, one based on an application definition (usually in YAML format) and the other using shortcut commands. Both are typically triggered using the kubectl CLI. In either case, the developer gives kubernetes a desired state that includes details about which container image, how many replicas and exposed ports, for example. Kubernetes then assumes the job of ensuring that the desired state is the actual state. When nodes in a cluster change, or containers fail, kubernetes acts in realtime to to what is necessary to get back to the desired state.

Consumer

The consumer doesn’t need to know anything about the kubernetes cluster, it’s members or even that kubernetes is being used. The gateway shown might be an nginx reverse proxy or HAProxy. The important point is that the gateway needs to be able to route to the pods, which are generally managed on a flannel or calico type network. It is possible to create redundant gateways and place them behind a load balancer.

Services are used to expose pods (and deployments). Usually the service type of LoadBalancer will trigger an automatic reconfiguration of the gateway to route traffic. Since the gateway is a single host, each port can be used only once. To get around this limitation, it is possible to use an ingress controller to provide name based routing.

Conclusion

Kubernetes definitely has its share of complexity. Depending on your role, it can be very approachable. Cluster installation is by far the most difficult part, but after that, the learning curve is quite small.

Software Engineering

IT General Controls: Infrastructure vs Routing

IT general controls are important for various reasons, such as business continuity and regulatory compliance. Traditionally, controls have focused on the infrastructure itself. In the context of long running servers in fixed locations, this was often an effective approach. As virtualization and container technologies become more prevalent, especially in public cloud, infrastructure focused IT controls can start to get in the way of realizing the following benefits:

  • Just in time provisioning
  • Workload migration
  • Network isolation
  • Tight capacity management
  • DevOps
  • Automated deployments
  • Automated remediation

One way to maintain strong IT controls can still get the above benefits is to shift the focus of those controls away from the infrastructure and instead focus on routing (traffic management).

As shown above, a focus on routing ensures that IT can control where production traffic is routed, including production data. Engineering teams are free to deploy as needed and automation can be used freely. Since infrastructure is replaced with each deployment, rather than updated, there is no need to maintain rigid controls around any specific server, VM or container.

In the diagram shown, a gateway is used to facilitate routing. Other mechanisms, like segregated container image repositories and deployment environments may also be appropriate.

Software Engineering

Infrastructure as Code

One of the most significant enablers of IT and software automation has been the shift away from fixed infrastructure to flexible infrastructure. Virtualization, process isolation, resource sharing and other forms of flexible infrastructure have been in use for many decades in IT systems. It can be seen in early Unix systems, Java application servers and even in common tools such as Apache and IIS in the form of virtual hosts. If flexible infrastructure has been a part of technology practice for so long, why is it getting so much buzz now?

Infrastructure as Code

In the last decade, virtualization has become more accessible and transparent, in part due to text based abstractions that describe infrastructure systems. There are many such abstractions that span IaaS, PaaS, CaaS (containers) and other platforms, but I see four major categories of tool that have emerged.

  • Infrastructure Definition. This is closest to defining actual server, network and storage.
  • Runtime or system configuration. This operates on compute resources to overlay system libraries, policies, access control, etc.
  • Image definition. This produces an image or template of a system or application that can then be instantiated.
  • Application description. This is often a composite representation of infrastructure resources and relationships that together deliver a functional system.

Right tool for the right job

I have observed a trend among these toolsets to expand their scope beyond one of these categories to encompass all of them. For example, rather than use a chain of tools such as Packer to define an image, HEAT to define the infrastructure and Ansible to configure the resources and deploy the application, someone will try to use Ansible to to all three. Why is that bad?

A tool like HEAT is directly tied to the OpenStack charter. It endeavors to adhere to the native APIs as they evolve. The tools is accessible, reportable and integrated into the OpenStack environment where the managed resources are also visible. This can simplify troubleshooting and decrease development time. In my experience, a tool like Ansible generally lags behind in features, API support and lacks the native interface integration. Some argue that using a tool like Ansible makes the automation more portable between cloud providers. Given the different interfaces and underlying APIs, I haven’t seen this actually work. There is always a frustrating translation when changing providers, and in many cases there is additional frustration due to idiosyncrasies of the tool, which could have been avoided if using more native interfaces.

The point I’m driving at is that when a native, supported and integrated tool exists for a given stage of automation, it’s worth exploring, even if it represents another skill set for those who develop the automation. The insight gained can often lead to a more robust and appropriate implementation. In the end, a tool can call a combination of HEAT and Ansible as easily as just Ansible.

Containers vs. Platforms

Another lively discussion over the past few years revolves around where automation efforts should focus. AWS made popular the idea that automation at the IaaS layer was the way to go. A lot of companies have benefitted from that, but many more have found the learning curve too steep and the cost of fixed resources too high. Along came Heroku and promised to abstract away all the complexity of IaaS but still deliver all the benefits. The cost of that benefit came in either reduced flexibility or a steep learning curve to create new deployment contexts (called buildpacks). When Docker came along and provided a very easy way to produce a single function image that could be quickly instantiated, this spawned discussion related to how the container lifecycle should be orchestrated.

Containers moved the concept of image creation away from general purpose compute, which had been the focus of IaaS, and toward specialized compute, such as a single application executable. Start time and resource efficiency made containers more appealing than virtual servers, but questions about how to handle networking and storage remained. The docker best practice of single function containers drove up the number of instances when compared to more complex virtual servers that filled multiple roles and had longer life cycles. Orchestration became the key to reliable container based deployments.

The descriptive approaches that evolved to accommodate containers, such as kubernetes, provide more ease and speed than IaaS, while providing more transparency and control than PaaS. Containers make it possible to define their application deployment scenario, including images, networking, storage, configuration, routing, etc., in plain text and trust the Container as a Service (CaaS) to orchestrate it all.

Evolution

Up to this point, infrastructure as code has evolved from shell and bash scripts, to infrastructure definitions for IaaS tools, to configuration and image creation tools for what those environments look like to full application deployment descriptions. What remains to mature are the configuration, secret management and regional distribution of compute locality for performance and edge data processing.

Software Engineering

Kubernetes vs. Docker Datacenter

I found this article on serverwatch today: http://www.serverwatch.com/server-trends/why-kubernetes-is-all-conquering.html

It’s not technically deep, but it does highlight the groundswell of interest for and adoption of kubernetes. It’s also worth noting that GCE and Azure will now both have a native, fully managed kubernetes offering. I haven’t found a fully managed docker datacenter offering, but I’m sure there is one. It would be interesting to compare the two from a public cloud offering perspective.

I’ve worked a lot with OpenStack for on premises clouds. This naturally leads to the idea of using OpenStack as a platform for container orchestration platforms (yes, I just layered platforms). As of today, the process of standing up Docker Datacenter or kubernetes still needs to mature. Last month eBay mentioned that it created its own kubernetes deployment tool on top of openstack: http://www.zdnet.com/article/ebay-builds-its-own-tool-to-integrate-kubernetes-and-openstack/. While it does plan to open source the new tool, it’s not available today.

One OpenStack Vendor, Mirantis, provides support for kubernetes through Murano as their preferred container solution: https://www.mirantis.com/solutions/container-technologies/. I’m not sure how reliable Murano is for long term management of kubernetes. For organizations that have an OpenStack vendor, support like this could streamline the evaluation and adoption of containers in the enterprise.

I did find a number of demo, PoC, kick the tires examples of Docker datacenter on OpenStack, but not much automation or production support. I still love the idea of using the Docker trusted registry. I know that kubernetes provides a private registry component (https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/registry), but it’s not as sophisticated as Docker Trusted Registry in terms of signing, scanning, etc. However, this functionality is quickly making its way into kubernetes, with some functionality already available in alpha: https://github.com/kubernetes/kubernetes/issues/22888

On the whole, I’m more drawn to kubernetes from a wholistic point of view, but Docker is effectively keying into some real enterprise concerns. Given the open source community and vendor investment in kubernetes, I expect the enterprise gap (like a trusted registry for kubernetes) will close this year.

Software Engineering

High level view of Container Orchestration

Container orchestration is at the heart of a successful container architecture. Orchestration takes as input a definition of how a deployed application should look. This usually includes how many containers for a certain image are needed, volumes for persistent data, networking for communication between containers and awareness of various discovery mechanisms. Discovery may include such things as identifying other containers which are also participating with the application or how to access services required by the running containers. Here’s a high level view.

container-architecture

Infrastructure

Containers need infrastructure to run. Both virtual and physical infrastructure can be used to host containers. Some argue that it’s better to run containers directly on physical servers to get the maximum performance. While there are performance benefits, there is also more operational overhead in standing up and maintaining physical servers. Automation available in virtual environments often makes it easier to provision, monitor and remediate servers. Using virtual infrastructure also makes it possible to share capacity between different types of workloads, where some may not be optimized for containers. Tools like Docker cloud (formerly Tutum) and Rancher can streamline operations for virtual environments.

If all workloads will be containerized and top performance is critical, favor a physical deployment. If some applications will still require IaaS and capacity will be shared between various types of workloads, choose virtual.

Orchestration

Orchestration is the process by which containers are managed to ensure that a predefined application configuration is maintained. These often require a plain text definition (usually YAML) of which container images are wanted, networking between those containers, mounted volumes, etc. The orchestration tool is then given this definition, which it uses to pull the necessary images and create containers, setup networking and mount storage.

Kubernetes

Kubernetes (http://kubernetes.io/) is was originally contributed to the open source community by Google and was based on their decade old container technology Borg. It aims to be a comprehensive container management platform providing everything from orchestration to monitoring to service and discovery and more. It abstracts the container technology in what it calls a pod, making it possible to use Docker or rkt or any other technology that comes around in the future. For many people the appeal of this platform is that it has no direct tie back to a commercial vendor, so investment is more likely to be driven by the community.

Swarm

Docker Swarm is Docker’s orchestration layer. It is designed to integrate seamlessly with other Docker tools, including the Docker daemon and registry tools. Some of the appeal to Swarm has to do with simplicity. Swarn is more narrowly focused than kubernetes, which may suggest better focus and more flexibility in choosing the right solutions for each container management need, althought it’s optimal to stick with Docker solutions.

PaaS

As containers continue to grow in prominence, some PaaS solutions, such as cloudfoundry, are reworking their narrative to position themselves as container management systems. It is true that the current version of cloudfoundry supports direct deployment of Docker container images and provides platform components, like routing, health management and scaling. Some drawbacks to using a PaaS for container orchestration is that deployments become more prescriptive and it provides less granular control over container deployment and interactions.

Image management

Container images can be created in several ways, including using a mechanism like Dockerfile, or using other automation tools. Container images should never contain credentials or other sensitive data (see Discovery below). In some cases it may be appropriate to host an internal container registry. External registry options that provide private images may provide sufficient protection for some applications.

Another aspect of image security has to do with vulnerabilities. Some registry solutions provide image scanning tools that can detect vulnerabilities or out of date packages. When external images are used as a base for internal application images, these should be carefully curated and confirmed to be safe before using them to derive application images.

Automation

One motivation behind containerization is that it better accommodates Continuous Integration (CI) and Continuous Delivery (CD). When building CI/CD pipelines, it’s important that the orchestration layer make it easy to automate to lifecycle of containers for unittests, functional tests, load tests and other automatic verification of the current state of an applicaiton. The CI/CD pipeline may be responsible for both triggering container creation as well as creating the container image.

Two way communication with CI/CD tooling is important so that the end result of testing and validation can be reported and possibly acted on by the CI/CD tool affecting later stages.

Discovery

Discovery is the process by which a container identifies other containers and services or registers itself to be found by other containers with which it participates in order to function. Discovery may include scenarios such as finding a database or static file storage with data necessary to run, or identifying other containers across which requests are distributed in order to accommodate synchronization.

Two common solutions for Discovery include a distributed key/value store and DNS. A distributed key/value store, such as etcd, ensures that each physical node hosting containers has a synchronized set of key/value data. In this scenario, the orchestration tool can add details about newly created containers to the key/value store so that existing containers are aware of them. New containers can query the key/value store to identify related containers and services.

DNS based discovery (a popular tools is Consul) is very similar, except that DNS is used to manage resolution of services and containers based on URLs. In this way, new containers can simply call the predetermined URL and trust that the request will be routed to the appropriate container or resource. As containers change, DNS is updated in realtime so that no changes are required on individual containers.

Software Engineering

What is Cloud Native?

I hear a lot of people talking about cloud native applications these days. This includes technologists and business managers. I have found that there really is a spectrum of meaning for the term cloud native and that two people rarely mean the same thing when they say cloud native.

At one end of the spectrum would be running a traditional workload on a virtual machine. In this scenario the virtual host may have been manually provisioned, manually configured, manually deployed, etc. It’s cloudiness comes from the fact that it’s a virtual machine running in the cloud.

I tend to think of cloud native at the other end and propose the following definition:

The ability to provision and configure infrastructure, stage and deploy an application and address the scale and health needs of the application in an automated and deterministic way without human interaction

The activities necessary to accomplish the above are:

  • Provision
  • Configure
  • Build and Test
  • Deploy
  • Scale and Heal

application-lifecycle-elements

Provision and Configure

The following diagram illustrates some of the workflow involved in provisioning and configuring resources for a cloud native application.

You’ll notice that there are some abstractions listed, including HEAT for openstack, CloudFormation for AWS and even Terraform, which can provision against both openstack and AWS. You’ll also notice that I include a provision flow that produces an image rather than an actual running resource. This can be helpful when using IaaS directly, but becomes essential when using containers. The management of that image creation process should include a CI/CD pipeline and a versioned image registry (more about that another time).

Build, Test, Deploy

With provisioning defined it’s time to look at the application Build, Test and Deploy steps. These are depicted in the following figure:

The color of the “Prepare Infrastructure” activity should hint that in this process it represents the workflow shown above under Provision and Configure. For clarity, various steps have been grouped under the heading “Application Staging Process”. While these can occur independently (and unfortunately sometimes testing never happens), it’s helpful to think of those four steps as necessary to validate any potential release. It should be possible to fully automate the staging of an application.

Discovery

The discovery step is often still done in a manual way using configuration files or even manual edits after deploy. Discovery could include making sure application components know how to reach a database or how a load balancer knows to which application servers it should direct traffic. In a cloud native application, this discovery should be fully automated. When using containers it will be essential and very fluid. Some mechanisms that accommodate discovery include system level tools like etcd and DNS based tools like consul.

Monitor and Heal or Scale

There are loads of monitoring tools available today. A cloud native application requires monitoring to be close to real time and needs to be able to act on monitoring outputs. This may involve creating new resources, destroying unhealthy resources and even shifting workloads around based on latency or other metrics.

Tools and Patterns

There are many tools to establish the workflows shown above. The provision step will almost always be provider specific and based on their API. Some tools, such as terraform, attempt to abstract this away from the provider with mixed results. The configure step might include Ansible or a similar tool. The build, test and deploy process will likely use a tool like Jenkins to accomplish automation. In some cases the above process may include multiple providers, all integrated by your application.

Regardless of the tools you choose, the most important characteristic of a cloud native application is that all of the activities listed are automated and deterministic.

Software Engineering

The Road to PaaS

I have observed that discussions about CloudFoundry often lack accurate context. Some questions I get that indicate context is missing include:

  • What Java version does CloudFoundry support?
  • What database products/versions are available
  • How can I access the server directly?

There are a few reasons that the questions above are not relevant for CloudFoundry (or any modern PaaS environment). To understand why, it’s important to understand how we got to PaaS and where we came from.

cloudfoundry-compared-traditional

Landscape

When computers were first becoming a common requirement for the enterprise, most applications were monolithic. All applicaiton components would run on the same general purpose server. This included interface, application technology (e.g. Java, .NET and PHP) and data and file storage. Over time, these functions were distributed across different servers. The servers also began to take on characteristic differences that would accommodate the technology being run.

Today, compute has been commoditized and virtualized. Rather than thinking of compute as a physical server, built to suit a specific purpose, compute is instead viewed in discreet chunks that can be scaled horizontally. PaaS today marries an application with those chunks of compute capacity as needed and abstracts application access to services, which may or may not run on the same PaaS platform.

Contributor and Organization Dynamic

The role of contributors and organizations have changed throughout the evolution of the landscape. Early monolithic systems required technology experts who were familiar with a broad range of technologies, including system administration, programming, networking, etc. As the functions were distributed, the roles became more defined by their specializations. Webmasters, DBAs, and programmers became siloed. Some unintended conflicts complicated this more distributed architecture due in part to the fact that efficiencies in one silo did not always align with the best interests of other silos.

DevOps

As the evolution pushed toward compute as a commodity, the new found flexibility drove many frustrated technologists to reach beyond their respective silo to accomplish their design and delivery objectives. Programmers began to look at how different operating system environments and database technologies could enable them to produce results faster and more reliably. System administrators began to rethink system management in ways that abstracted hardware dependencies and decreased the complexity involved in augmenting compute capacity available to individual functions. Datastore, network, storage and other experts began a similar process of abstracting their offering. This blending of roles and new dynamic of collaboration and contribution has come to be known as DevOps.

Interoperability

Interoperability between systems and applications in the days of monolithic application development made use of many protocols. This was due in part to the fact that each monolithic system exposed it’s services in different ways. As the above progression took place, the field of available protocols normalized. RESTful interfaces over HTTP have emerged as an accepted standard and the serialization structures most common to REST are XML and JSON. This makes integration straight forward and provides for a high amount of reuse of existing services. This also makes services available to a greater diversity of devices.

Security and Isolation

One key development that made this evolution from compute as hardware to compute as a utility possible was effective isolation of compute resources on shared hardware. The first big step in this direction came in the form of virualization. Virtualized hardware made it possible to run many distinct operating systems simultaneously on the same hardware. It also significantly reduced the time to provision new server resources, since the underlying hardware was already wired and ready.

Compute as a ________

The next step in the evolution came in the form of containers. Unlike virtualization, containers made it possible to provide an isolated, configurable compute instance in much less time that consumed fewer system resources to create and manage (i.e. lightweight). This progression from compute as hardware to compute as virtual and finally to compute as a container made it realistic to literally view compute as discreet chunks that could be created and destroyed in seconds as capacity requirements changed.

Infrastructure as Code

Another important observation regarding the evolution of compute is that as the compute environment became easier to create (time to provision decreased), the process to provision changed. When a physical server required ordering, shipping, mounting, wiring, etc., it was reasonable to take a day or two to install and configure the operating system, network and related components. When that hardware was virtualized and could be provisioned in hours (or less), system administrators began to pursue more automation to accommodate the setup of these systems (e.g. ansible, puppet, chef and even Vagrant). This made it possible to think of systems as more transient. With the advent of Linux containers, the idea of infrastructure as code became even more prevalent. Time to provision is approaching zero.

A related byproduct of infrastructure defined by scripts or code was reproduceability. Whereas it was historically difficult to ensure that two systems were configured identically, the method for provisioning containers made it trivial to ensure that compute resources were identically configured. This in turn improved debugging, collaboration and accommodated versioning of operating environments.

Contextual Answers

Given that the landscape has changed so drastically, let’s look at some possible answers to the questions from the beginning of this post.

  • Q. What Java (or any language) version does CloudFoundry support?
    A. It supports any language that is defined in the scripts used to provision the container that will run the application. While it is true that some such scripts may be available by default, this doesn’t imply that the PaaS provides only that. If it’s a fit, use it. If not, create new provisioning scripts.
  • Q. What database products/versions are available?
    A. Any database product or version can be used. If the datastore services available that are associated with the PaaS by default are not sufficient, bring your own or create another application component to accommodate your needs.
  • Q. How can I access the server directly?
    A. There is no “the server” If you want to know more about the server environment, look at the script/code that is responsible for provisioning it. Even better, create a new container and play around with it. Once you get things just right, update your code so that every new container incorporates the desired changes. Every “the server” will look exactly how you define it.
Software Engineering

Nginx in Docker for Stackato buildpacks

I’m about to write a few articles about creating buildpacks for Stackato, which is a derivative of CloudFoundry and the technology behind Helion Development Platform. The approach for deploying nginx in docker as part of a buildpack differs from the approach I published previously. There are a few reasons for this:

  • Stackato discourages root access to the docker image
  • All services will run as the stackato user
  • The PORT to which services must bind is assigned and in the free range (no root required)

Get a host setup to run docker

The easiest way to follow along with this tutorial is to deploy stackato somewhere like hpcloud.com. The resulting host will have the docker image you need to follow along below.

Manually prepare a VM to run docker containers

You can also use Vagrant to spin up a virtual host where you can run these commands.

vagrant init ubuntu/trusty64

Modify the Vagrantfile to contain this line

  config.vm.provision :shell, path: "bootstrap.sh"

Then create a the bootstrap.sh file based on the details below.

bootstrap.sh

#!/usr/bin/env bash
 
# set proxy variables
#export http_proxy=http://proxy.example.com:8080
#export https_proxy=https://proxy.example.com:8080
 
# bootstrap ansible for convenience on the control box
apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9
sh -c "echo deb https://get.docker.io/ubuntu docker main > /etc/apt/sources.list.d/docker.list"
apt-get update
apt-get -y install lxc-docker
 
# need to add proxy specifically to docker config since it doesn't pick them up from the environment
#sed -i '$a export http_proxy=http://proxy.example.com:8080' /etc/default/docker
#sed -i '$a export https_proxy=https://proxy.example.com:8080' /etc/default/docker
 
# enable non-root use by vagrant user
groupadd docker
gpasswd -a vagrant docker
 
# restart to enable proxy
service docker restart
# give docker a few seconds to start up
sleep 2s
 
# load in the stackato/stack-alsek image (can take a while)
docker load < /vagrant/stack-alsek.tar

Get the stackato docker image

Before the last line in the above bootstrap.sh script will work, it’s necessary to place the docker image for Stackato in the same directory as the Vagrantfile. Unfortunately the Stackato docker image is not published independently, which makes it more complicated to get. One way to do this is to deploy stackato locally and grab a copy of the image with this command.

docker save stackato/stack-alsek > stack-alsek.tar

You might want to save a copy of stack-alsek.tar to save time in the future. I’m not sure if it can be published (legally), but you’ll want to update this image with each release of Stackato anyway.

Launch a new docker instance using the Stackato image

You should now have three files in the directory where you did ‘vagrant init’.

-rw-rw-rw-   1 user     group          4998 Oct  9 14:01 Vagrantfile
-rw-rw-rw-   1 user     group           971 Oct  9 14:23 bootstrap.sh
-rw-rw-rw-   1 user     group    1757431808 Oct  9 14:02 stack-alsek.tar

At this point you should be ready to create a new VM and spin up a docker instance. First tell Vagrant to build the virtual server.

vagrant up

Next, log in to your server and create the docker container.

vagrant@vagrant-ubuntu-trusty-64:~$ docker run -i -t stackato/stack-alsek:latest /bin/bash
root@33ad737d42cf:/#

Build and configure your services

Once you have a system setup and can create docker instances based on the Stackato image, you’re ready to craft your buildpack compile script. One of the first things I do is install the w3m browser so I can test my setup. In this example, I’m just going to build and test nginx. The same process could be used to build any number of steps into a compile script. It may be necessary to manage http_proxy, https_proxy and no_proxy environment variables for both root and stackato users while completing the steps below.

apt-get -y install w3m

Since everything in a stackato DEA is expected to run as the stackato user, we’ll switch to that user and move into the home directory

root@33ad737d42cf:/# su stackato
stackato@33ad737d42cf:/$ cd
stackato@33ad737d42cf:~$ pwd
/home/stackato/

Next I’m going to grab the source for nginx and configure and make.

wget -e use_proxy=yes http://nginx.org/download/nginx-1.6.2.tar.gz
tar xzf nginx-1.6.2.tar.gz
cd nginx-1.6.2
./configure
make

By this point nginx has been built successfully and we’re in the nginx source directory. Next I want to update the configuration file to use a non-privileged port. For now I’ll use 8080, but Stackato will assign the actual PORT when deployed.

mv conf/nginx.conf conf/nginx.conf.original
sed 's/\(^\s*listen\s*\)80/\1 8080/' conf/nginx.conf.original > conf/nginx.conf

I also need to make sure that there is a logs directory available for nginx error logs on startup.

mkdir logs

It’s time to test the nginx build, which we can do with the command below. A message should be displayed saying the test was successful.

stackato@33ad737d42cf:~/nginx-1.6.2$ ./objs/nginx -t -c conf/nginx.conf -p /home/stackato/nginx-1.6.2
nginx: the configuration file /home/stackato/nginx-1.6.2/conf/nginx.conf syntax is ok
nginx: configuration file /home/stackato/nginx-1.6.2/conf/nginx.conf test is successful

With the setup and configuration validated, it’s time to start nginx and verify that it works.

./objs/nginx -c conf/nginx.conf -p /home/stackato/nginx-1.6.2

At this point it should be possible to load the nginx welcome page using the command below.

w3m http://localhost:8080

nginx-docker

Next steps

If an application requires other resources, this same process can be followed to build, integrate and test them within a running docker container based on the stackato image.

Software Engineering

Use Docker to Build a LEMP Stack (Buildfile)

I’ve been reviewing Docker recently. As part of that review, I decided to build a LEMP stack in Docker. I use Vagrant to create an environment in which to run Docker. For this experiment I chose to create Buildfiles to create the Docker container images. I’ll be discussing the following files in this post.

Vagrantfile
bootstrap.sh
mysql/Dockerfile
mysql/mysqlpwdseed
nginx/Dockerfile
nginx/default
nginx/wall.php

Download the Docker LEMP files as a zip (docker-lemp.zip).

Spin up the Host System

I start with Vagrant to spin up a host system for my Docker containers. To do this I use the following files.

Vagrantfile

# -*- mode: ruby -*-
# vi: set ft=ruby :
 
# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"
 
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
  config.vm.box = "ubuntu/trusty64"
  config.vm.network "public_network"
  config.vm.provider "virtualbox" do |v|
    v.name = "docker LEMP"
    v.cpus = 1
    v.memory = 512
  end
  config.vm.provision :shell, path: "bootstrap.sh"
 
end

I start with Ubuntu and keep the size small. I create a public network to make it easier to test my LEMP setup later on.

bootstrap.sh

#!/usr/bin/env bash
 
# set proxy variables
#export http_proxy=http://proxy.example.com:8080
#export https_proxy=https://proxy.example.com:8080
 
# bootstrap ansible for convenience on the control box
apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9
sh -c "echo deb https://get.docker.io/ubuntu docker main > /etc/apt/sources.list.d/docker.list"
apt-get update
apt-get -y install lxc-docker
 
# need to add proxy specifically to docker config since it doesn't pick them up from the environment
#sed -i '$a export http_proxy=http://proxy.example.com:8080' /etc/default/docker
#sed -i '$a export https_proxy=https://proxy.example.com:8080' /etc/default/docker
 
# enable non-root use by vagrant user
groupadd docker
gpasswd -a vagrant docker
 
# restart to enable proxy
service docker restart

I’m working in a proxied environment, so I need to provide proxy details to the Vagrant host for subsequent Docker installation steps. Unfortunately Docker doesn’t key off the typical environment variables for proxy, so I have to define them explicitly in the docker configuration. This second proxy configuration allows Docker to download images from DockerHub. Finally I create a docker group and add the vagrant user to it so I don’t need sudo for docker commands.

At this point, it’s super easy to play around with Docker and you might enjoy going through the Docker User Guide. We’re ready to move on to create Docker container images for our LEMP stack.

Build the LEMP Stack

This LEMP stack will be split across two containers, one for MySQL and the other for Nginx+PHP. I build on Ubuntu as a base, which may increase the total size of the image in exchange for the ease of using Ubuntu.

MySQL Container

We’ll start with MySQL. Here’s the Dockerfile.

Dockerfile

# LEMP stack as a docker container
FROM ubuntu:14.04
MAINTAINER Daniel Watrous <email>
#ENV http_proxy http://proxy.example.com:8080
#ENV https_proxy https://proxy.example.com:8080
 
RUN apt-get update
RUN apt-get -y upgrade
# seed database password
COPY mysqlpwdseed /root/mysqlpwdseed
RUN debconf-set-selections /root/mysqlpwdseed
 
RUN apt-get -y install mysql-server
 
RUN sed -i -e"s/^bind-address\s*=\s*127.0.0.1/bind-address = 0.0.0.0/" /etc/mysql/my.cnf
 
RUN /usr/sbin/mysqld & \
    sleep 10s &&\
    echo "GRANT ALL ON *.* TO admin@'%' IDENTIFIED BY 'secret' WITH GRANT OPTION; FLUSH PRIVILEGES" | mysql -u root --password=secret &&\
    echo "create database test" | mysql -u root --password=secret
 
# persistence: http://txt.fliglio.com/2013/11/creating-a-mysql-docker-container/ 
 
EXPOSE 3306
 
CMD ["/usr/bin/mysqld_safe"]

Notice that I declare my proxy servers for the third time here. This one is so that the container can access package downloads. I then seed the root database passwords and proceed to install and configure MySQL. Before running CMD, I expose port 3306. Remember that this port will be exposed on the private network between Docker containers and is only accessible to linked containers when you start it the way I show below. Here’s the mysqlpwdseed file.

mysqlpwdseed

mysql-server mysql-server/root_password password secret
mysql-server mysql-server/root_password_again password secret

If you downloaded the zip file above and ran vagrant in the resulting directory, you should have the files above available in /vagrant/mysql. The following commands will build and start the MySQL container.

cd /vagrant/mysql
docker build -t "local/mysql:v1" .
docker images
docker run -d --name mysql local/mysql:v1
docker ps

At this point you should show a local image for MySQL and a running mysql container, as shown below.

docker-mysql-container

Nginx Container

Now we’ll build the Nginx container with PHP baked in.

Dockerfile

# LEMP stack as a docker container
FROM ubuntu:14.04
MAINTAINER Daniel Watrous <email>
ENV http_proxy http://proxy.example.com:8080
ENV https_proxy https://proxy.example.com:8080
 
# install nginx
RUN apt-get update
RUN apt-get -y upgrade
RUN apt-get -y install nginx
RUN echo "daemon off;" >> /etc/nginx/nginx.conf
RUN mv /etc/nginx/sites-available/default /etc/nginx/sites-available/default.bak
COPY default /etc/nginx/sites-available/default
 
# install PHP
RUN apt-get -y install php5-fpm php5-mysql
RUN sed -i s/\;cgi\.fix_pathinfo\s*\=\s*1/cgi.fix_pathinfo\=0/ /etc/php5/fpm/php.ini
 
# prepare php test scripts
RUN echo "<?php phpinfo(); ?>" > /usr/share/nginx/html/info.php
ADD wall.php /usr/share/nginx/html/wall.php
 
# add volumes for debug and file manipulation
VOLUME ["/var/log/", "/usr/share/nginx/html/"]
 
EXPOSE 80
 
CMD service php5-fpm start && nginx

After setting the proxy, I install Nginx and update the configuration file. I then install PHP5 and update the php.ini file. I then use two different methods to create php files. If you’re deploying an actual application to a Docker container, you may not end up using either of these, but instead install a script that will grab your application from git or subversion.

Next I define two volumes. You’ll see shortly that this makes it straightforward to view logs and manage code for debug. Finally I expose port 80 for web traffic and the CMD references two commands using && to make sure both PHP and Nginx are running in the container.

default

server {
        listen 80 default_server;
        listen [::]:80 default_server ipv6only=on;
 
        root /usr/share/nginx/html;
        index index.php index.html index.htm;
 
        server_name localhost;
 
        location / {
                try_files $uri $uri/ =404;
        }
 
        error_page 404 /404.html;
 
        error_page 500 502 503 504 /50x.html;
        location = /50x.html {
                root /usr/share/nginx/html;
        }
 
        location ~ \.php$ {
                fastcgi_split_path_info ^(.+\.php)(/.+)$;
                fastcgi_pass unix:/var/run/php5-fpm.sock;
                fastcgi_index index.php;
                include fastcgi_params;
        }
}

This is my default Nginx configuration file.

wall.php

<?php
 
// database credentials (defined in group_vars/all)
$dbname = "test";
$dbuser = "admin";
$dbpass = "secret";
$dbhost = "mysql";
 
// query templates
$create_table = "CREATE TABLE IF NOT EXISTS `wall` (
   `id` int(11) unsigned NOT NULL auto_increment,
   `title` varchar(255) NOT NULL default '',
   `content` text NOT NULL default '',
   PRIMARY KEY  (`id`)
   ) ENGINE=MyISAM  DEFAULT CHARSET=utf8";
$select_wall = 'SELECT * FROM wall';
 
// Connect to and select database
$link = mysql_connect($dbhost, $dbuser, $dbpass)
    or die('Could not connect: ' . mysql_error());
echo "Connected successfully\n<br />\n";
mysql_select_db($dbname) or die('Could not select database');
 
// create table
$result = mysql_query($create_table) or die('Create Table failed: ' . mysql_error());
 
// handle new wall posts
if (isset($_POST["title"])) {
    $result = mysql_query("insert into wall (title, content) values ('".$_POST["title"]."', '".$_POST["content"]."')") or die('Create Table failed: ' . mysql_error());
}
 
// Performing SQL query
$result = mysql_query($select_wall) or die('Query failed: ' . mysql_error());
 
// Printing results in HTML
echo "<table>\n";
while ($line = mysql_fetch_array($result, MYSQL_ASSOC)) {
    echo "\t<tr>\n";
    foreach ($line as $col_value) {
        echo "\t\t<td>$col_value</td>\n";
    }
    echo "\t</tr>\n";
}
echo "</table>\n";
 
// Free resultset
mysql_free_result($result);
 
// Closing connection
mysql_close($link);
?>
 
<form method="post">
Title: <input type="text" name="title"><br />
Message: <textarea name="content"></textarea><br />
<input type="submit" value="Post to wall">
</form>

Wall is meant to be a simple example that shows both PHP and MySQL are working. The commands below will build this new image and start it as a linked container to the already running MySQL container.

cd /vagrant/nginx
docker build -t "local/nginx:v1" .
docker images
docker run -d -p 80:80 --link mysql:mysql --name nginx local/nginx:v1
docker ps

So that now your Docker environment looks like the image below with an image for MySQL and one for Nginx and two processes running that are linked together.

docker-nginx-container

As shown in the image above, we have mapped port 80 on the container to port 80 on the host system. This means we can discover the IP address of our host (remember the public network in the vagrant file) and load the web page.

docker-php-wall-browser

Working with Running Containers

True to the single responsibility container design of Docker, these running containers are only running their respective service(s). That means they aren’t running SSH (and they shouldn’t be). So how do we interact with them, such as viewing the log files on the Nginx server or connecting to MySQL? We use Linked containers that leverage the shared volumes or exposed ports.

Connect to MySQL

To connect to MySQL, create a new container from the same MySQL image and link it to the already running MySQL. Start that container with /bin/bash. You can then use the mysql binaries to connect. Notice that I identify the host as ‘mysql’. That’s because when I linked the containers, Docker added an alias in the /etc/hosts file of the new container that mapped the private Docker network IP to ‘mysql’.

docker run -i -t --link mysql:mysql local/mysql:v1 /bin/bash
mysql -u admin -h mysql test --password=secret

docker-mysql-linked

View Ngnix Logs

It’s just as easy to view the Nginx logs. In this case I start up a plain Ubuntu container using the –volumes-from and indicate the running nginx container. From bash I can easily navigate to the Nginx logs or the html directory.

docker run -i -t --volumes-from nginx ubuntu /bin/bash

docker-nginx-view-logs

References

https://docs.docker.com/userguide/dockerlinks/