Daniel Watrous on Software Engineering

A Collection of Software Problems and Solutions

Posts tagged swarm

Software Engineering

High level view of Container Orchestration

Container orchestration is at the heart of a successful container architecture. Orchestration takes as input a definition of how a deployed application should look. This usually includes how many containers for a certain image are needed, volumes for persistent data, networking for communication between containers and awareness of various discovery mechanisms. Discovery may include such things as identifying other containers which are also participating with the application or how to access services required by the running containers. Here’s a high level view.

container-architecture

Infrastructure

Containers need infrastructure to run. Both virtual and physical infrastructure can be used to host containers. Some argue that it’s better to run containers directly on physical servers to get the maximum performance. While there are performance benefits, there is also more operational overhead in standing up and maintaining physical servers. Automation available in virtual environments often makes it easier to provision, monitor and remediate servers. Using virtual infrastructure also makes it possible to share capacity between different types of workloads, where some may not be optimized for containers. Tools like Docker cloud (formerly Tutum) and Rancher can streamline operations for virtual environments.

If all workloads will be containerized and top performance is critical, favor a physical deployment. If some applications will still require IaaS and capacity will be shared between various types of workloads, choose virtual.

Orchestration

Orchestration is the process by which containers are managed to ensure that a predefined application configuration is maintained. These often require a plain text definition (usually YAML) of which container images are wanted, networking between those containers, mounted volumes, etc. The orchestration tool is then given this definition, which it uses to pull the necessary images and create containers, setup networking and mount storage.

Kubernetes

Kubernetes (http://kubernetes.io/) is was originally contributed to the open source community by Google and was based on their decade old container technology Borg. It aims to be a comprehensive container management platform providing everything from orchestration to monitoring to service and discovery and more. It abstracts the container technology in what it calls a pod, making it possible to use Docker or rkt or any other technology that comes around in the future. For many people the appeal of this platform is that it has no direct tie back to a commercial vendor, so investment is more likely to be driven by the community.

Swarm

Docker Swarm is Docker’s orchestration layer. It is designed to integrate seamlessly with other Docker tools, including the Docker daemon and registry tools. Some of the appeal to Swarm has to do with simplicity. Swarn is more narrowly focused than kubernetes, which may suggest better focus and more flexibility in choosing the right solutions for each container management need, althought it’s optimal to stick with Docker solutions.

PaaS

As containers continue to grow in prominence, some PaaS solutions, such as cloudfoundry, are reworking their narrative to position themselves as container management systems. It is true that the current version of cloudfoundry supports direct deployment of Docker container images and provides platform components, like routing, health management and scaling. Some drawbacks to using a PaaS for container orchestration is that deployments become more prescriptive and it provides less granular control over container deployment and interactions.

Image management

Container images can be created in several ways, including using a mechanism like Dockerfile, or using other automation tools. Container images should never contain credentials or other sensitive data (see Discovery below). In some cases it may be appropriate to host an internal container registry. External registry options that provide private images may provide sufficient protection for some applications.

Another aspect of image security has to do with vulnerabilities. Some registry solutions provide image scanning tools that can detect vulnerabilities or out of date packages. When external images are used as a base for internal application images, these should be carefully curated and confirmed to be safe before using them to derive application images.

Automation

One motivation behind containerization is that it better accommodates Continuous Integration (CI) and Continuous Delivery (CD). When building CI/CD pipelines, it’s important that the orchestration layer make it easy to automate to lifecycle of containers for unittests, functional tests, load tests and other automatic verification of the current state of an applicaiton. The CI/CD pipeline may be responsible for both triggering container creation as well as creating the container image.

Two way communication with CI/CD tooling is important so that the end result of testing and validation can be reported and possibly acted on by the CI/CD tool affecting later stages.

Discovery

Discovery is the process by which a container identifies other containers and services or registers itself to be found by other containers with which it participates in order to function. Discovery may include scenarios such as finding a database or static file storage with data necessary to run, or identifying other containers across which requests are distributed in order to accommodate synchronization.

Two common solutions for Discovery include a distributed key/value store and DNS. A distributed key/value store, such as etcd, ensures that each physical node hosting containers has a synchronized set of key/value data. In this scenario, the orchestration tool can add details about newly created containers to the key/value store so that existing containers are aware of them. New containers can query the key/value store to identify related containers and services.

DNS based discovery (a popular tools is Consul) is very similar, except that DNS is used to manage resolution of services and containers based on URLs. In this way, new containers can simply call the predetermined URL and trust that the request will be routed to the appropriate container or resource. As containers change, DNS is updated in realtime so that no changes are required on individual containers.