Daniel Watrous on Software Engineering

A Collection of Software Problems and Solutions

Posts tagged PaaS

Software Engineering

Infrastructure as Code

One of the most significant enablers of IT and software automation has been the shift away from fixed infrastructure to flexible infrastructure. Virtualization, process isolation, resource sharing and other forms of flexible infrastructure have been in use for many decades in IT systems. It can be seen in early Unix systems, Java application servers and even in common tools such as Apache and IIS in the form of virtual hosts. If flexible infrastructure has been a part of technology practice for so long, why is it getting so much buzz now?

Infrastructure as Code

In the last decade, virtualization has become more accessible and transparent, in part due to text based abstractions that describe infrastructure systems. There are many such abstractions that span IaaS, PaaS, CaaS (containers) and other platforms, but I see four major categories of tool that have emerged.

  • Infrastructure Definition. This is closest to defining actual server, network and storage.
  • Runtime or system configuration. This operates on compute resources to overlay system libraries, policies, access control, etc.
  • Image definition. This produces an image or template of a system or application that can then be instantiated.
  • Application description. This is often a composite representation of infrastructure resources and relationships that together deliver a functional system.

Right tool for the right job

I have observed a trend among these toolsets to expand their scope beyond one of these categories to encompass all of them. For example, rather than use a chain of tools such as Packer to define an image, HEAT to define the infrastructure and Ansible to configure the resources and deploy the application, someone will try to use Ansible to to all three. Why is that bad?

A tool like HEAT is directly tied to the OpenStack charter. It endeavors to adhere to the native APIs as they evolve. The tools is accessible, reportable and integrated into the OpenStack environment where the managed resources are also visible. This can simplify troubleshooting and decrease development time. In my experience, a tool like Ansible generally lags behind in features, API support and lacks the native interface integration. Some argue that using a tool like Ansible makes the automation more portable between cloud providers. Given the different interfaces and underlying APIs, I haven’t seen this actually work. There is always a frustrating translation when changing providers, and in many cases there is additional frustration due to idiosyncrasies of the tool, which could have been avoided if using more native interfaces.

The point I’m driving at is that when a native, supported and integrated tool exists for a given stage of automation, it’s worth exploring, even if it represents another skill set for those who develop the automation. The insight gained can often lead to a more robust and appropriate implementation. In the end, a tool can call a combination of HEAT and Ansible as easily as just Ansible.

Containers vs. Platforms

Another lively discussion over the past few years revolves around where automation efforts should focus. AWS made popular the idea that automation at the IaaS layer was the way to go. A lot of companies have benefitted from that, but many more have found the learning curve too steep and the cost of fixed resources too high. Along came Heroku and promised to abstract away all the complexity of IaaS but still deliver all the benefits. The cost of that benefit came in either reduced flexibility or a steep learning curve to create new deployment contexts (called buildpacks). When Docker came along and provided a very easy way to produce a single function image that could be quickly instantiated, this spawned discussion related to how the container lifecycle should be orchestrated.

Containers moved the concept of image creation away from general purpose compute, which had been the focus of IaaS, and toward specialized compute, such as a single application executable. Start time and resource efficiency made containers more appealing than virtual servers, but questions about how to handle networking and storage remained. The docker best practice of single function containers drove up the number of instances when compared to more complex virtual servers that filled multiple roles and had longer life cycles. Orchestration became the key to reliable container based deployments.

The descriptive approaches that evolved to accommodate containers, such as kubernetes, provide more ease and speed than IaaS, while providing more transparency and control than PaaS. Containers make it possible to define their application deployment scenario, including images, networking, storage, configuration, routing, etc., in plain text and trust the Container as a Service (CaaS) to orchestrate it all.

Evolution

Up to this point, infrastructure as code has evolved from shell and bash scripts, to infrastructure definitions for IaaS tools, to configuration and image creation tools for what those environments look like to full application deployment descriptions. What remains to mature are the configuration, secret management and regional distribution of compute locality for performance and edge data processing.

Software Engineering

High level view of Container Orchestration

Container orchestration is at the heart of a successful container architecture. Orchestration takes as input a definition of how a deployed application should look. This usually includes how many containers for a certain image are needed, volumes for persistent data, networking for communication between containers and awareness of various discovery mechanisms. Discovery may include such things as identifying other containers which are also participating with the application or how to access services required by the running containers. Here’s a high level view.

container-architecture

Infrastructure

Containers need infrastructure to run. Both virtual and physical infrastructure can be used to host containers. Some argue that it’s better to run containers directly on physical servers to get the maximum performance. While there are performance benefits, there is also more operational overhead in standing up and maintaining physical servers. Automation available in virtual environments often makes it easier to provision, monitor and remediate servers. Using virtual infrastructure also makes it possible to share capacity between different types of workloads, where some may not be optimized for containers. Tools like Docker cloud (formerly Tutum) and Rancher can streamline operations for virtual environments.

If all workloads will be containerized and top performance is critical, favor a physical deployment. If some applications will still require IaaS and capacity will be shared between various types of workloads, choose virtual.

Orchestration

Orchestration is the process by which containers are managed to ensure that a predefined application configuration is maintained. These often require a plain text definition (usually YAML) of which container images are wanted, networking between those containers, mounted volumes, etc. The orchestration tool is then given this definition, which it uses to pull the necessary images and create containers, setup networking and mount storage.

Kubernetes

Kubernetes (http://kubernetes.io/) is was originally contributed to the open source community by Google and was based on their decade old container technology Borg. It aims to be a comprehensive container management platform providing everything from orchestration to monitoring to service and discovery and more. It abstracts the container technology in what it calls a pod, making it possible to use Docker or rkt or any other technology that comes around in the future. For many people the appeal of this platform is that it has no direct tie back to a commercial vendor, so investment is more likely to be driven by the community.

Swarm

Docker Swarm is Docker’s orchestration layer. It is designed to integrate seamlessly with other Docker tools, including the Docker daemon and registry tools. Some of the appeal to Swarm has to do with simplicity. Swarn is more narrowly focused than kubernetes, which may suggest better focus and more flexibility in choosing the right solutions for each container management need, althought it’s optimal to stick with Docker solutions.

PaaS

As containers continue to grow in prominence, some PaaS solutions, such as cloudfoundry, are reworking their narrative to position themselves as container management systems. It is true that the current version of cloudfoundry supports direct deployment of Docker container images and provides platform components, like routing, health management and scaling. Some drawbacks to using a PaaS for container orchestration is that deployments become more prescriptive and it provides less granular control over container deployment and interactions.

Image management

Container images can be created in several ways, including using a mechanism like Dockerfile, or using other automation tools. Container images should never contain credentials or other sensitive data (see Discovery below). In some cases it may be appropriate to host an internal container registry. External registry options that provide private images may provide sufficient protection for some applications.

Another aspect of image security has to do with vulnerabilities. Some registry solutions provide image scanning tools that can detect vulnerabilities or out of date packages. When external images are used as a base for internal application images, these should be carefully curated and confirmed to be safe before using them to derive application images.

Automation

One motivation behind containerization is that it better accommodates Continuous Integration (CI) and Continuous Delivery (CD). When building CI/CD pipelines, it’s important that the orchestration layer make it easy to automate to lifecycle of containers for unittests, functional tests, load tests and other automatic verification of the current state of an applicaiton. The CI/CD pipeline may be responsible for both triggering container creation as well as creating the container image.

Two way communication with CI/CD tooling is important so that the end result of testing and validation can be reported and possibly acted on by the CI/CD tool affecting later stages.

Discovery

Discovery is the process by which a container identifies other containers and services or registers itself to be found by other containers with which it participates in order to function. Discovery may include scenarios such as finding a database or static file storage with data necessary to run, or identifying other containers across which requests are distributed in order to accommodate synchronization.

Two common solutions for Discovery include a distributed key/value store and DNS. A distributed key/value store, such as etcd, ensures that each physical node hosting containers has a synchronized set of key/value data. In this scenario, the orchestration tool can add details about newly created containers to the key/value store so that existing containers are aware of them. New containers can query the key/value store to identify related containers and services.

DNS based discovery (a popular tools is Consul) is very similar, except that DNS is used to manage resolution of services and containers based on URLs. In this way, new containers can simply call the predetermined URL and trust that the request will be routed to the appropriate container or resource. As containers change, DNS is updated in realtime so that no changes are required on individual containers.

Software Engineering

The Road to PaaS

I have observed that discussions about CloudFoundry often lack accurate context. Some questions I get that indicate context is missing include:

  • What Java version does CloudFoundry support?
  • What database products/versions are available
  • How can I access the server directly?

There are a few reasons that the questions above are not relevant for CloudFoundry (or any modern PaaS environment). To understand why, it’s important to understand how we got to PaaS and where we came from.

cloudfoundry-compared-traditional

Landscape

When computers were first becoming a common requirement for the enterprise, most applications were monolithic. All applicaiton components would run on the same general purpose server. This included interface, application technology (e.g. Java, .NET and PHP) and data and file storage. Over time, these functions were distributed across different servers. The servers also began to take on characteristic differences that would accommodate the technology being run.

Today, compute has been commoditized and virtualized. Rather than thinking of compute as a physical server, built to suit a specific purpose, compute is instead viewed in discreet chunks that can be scaled horizontally. PaaS today marries an application with those chunks of compute capacity as needed and abstracts application access to services, which may or may not run on the same PaaS platform.

Contributor and Organization Dynamic

The role of contributors and organizations have changed throughout the evolution of the landscape. Early monolithic systems required technology experts who were familiar with a broad range of technologies, including system administration, programming, networking, etc. As the functions were distributed, the roles became more defined by their specializations. Webmasters, DBAs, and programmers became siloed. Some unintended conflicts complicated this more distributed architecture due in part to the fact that efficiencies in one silo did not always align with the best interests of other silos.

DevOps

As the evolution pushed toward compute as a commodity, the new found flexibility drove many frustrated technologists to reach beyond their respective silo to accomplish their design and delivery objectives. Programmers began to look at how different operating system environments and database technologies could enable them to produce results faster and more reliably. System administrators began to rethink system management in ways that abstracted hardware dependencies and decreased the complexity involved in augmenting compute capacity available to individual functions. Datastore, network, storage and other experts began a similar process of abstracting their offering. This blending of roles and new dynamic of collaboration and contribution has come to be known as DevOps.

Interoperability

Interoperability between systems and applications in the days of monolithic application development made use of many protocols. This was due in part to the fact that each monolithic system exposed it’s services in different ways. As the above progression took place, the field of available protocols normalized. RESTful interfaces over HTTP have emerged as an accepted standard and the serialization structures most common to REST are XML and JSON. This makes integration straight forward and provides for a high amount of reuse of existing services. This also makes services available to a greater diversity of devices.

Security and Isolation

One key development that made this evolution from compute as hardware to compute as a utility possible was effective isolation of compute resources on shared hardware. The first big step in this direction came in the form of virualization. Virtualized hardware made it possible to run many distinct operating systems simultaneously on the same hardware. It also significantly reduced the time to provision new server resources, since the underlying hardware was already wired and ready.

Compute as a ________

The next step in the evolution came in the form of containers. Unlike virtualization, containers made it possible to provide an isolated, configurable compute instance in much less time that consumed fewer system resources to create and manage (i.e. lightweight). This progression from compute as hardware to compute as virtual and finally to compute as a container made it realistic to literally view compute as discreet chunks that could be created and destroyed in seconds as capacity requirements changed.

Infrastructure as Code

Another important observation regarding the evolution of compute is that as the compute environment became easier to create (time to provision decreased), the process to provision changed. When a physical server required ordering, shipping, mounting, wiring, etc., it was reasonable to take a day or two to install and configure the operating system, network and related components. When that hardware was virtualized and could be provisioned in hours (or less), system administrators began to pursue more automation to accommodate the setup of these systems (e.g. ansible, puppet, chef and even Vagrant). This made it possible to think of systems as more transient. With the advent of Linux containers, the idea of infrastructure as code became even more prevalent. Time to provision is approaching zero.

A related byproduct of infrastructure defined by scripts or code was reproduceability. Whereas it was historically difficult to ensure that two systems were configured identically, the method for provisioning containers made it trivial to ensure that compute resources were identically configured. This in turn improved debugging, collaboration and accommodated versioning of operating environments.

Contextual Answers

Given that the landscape has changed so drastically, let’s look at some possible answers to the questions from the beginning of this post.

  • Q. What Java (or any language) version does CloudFoundry support?
    A. It supports any language that is defined in the scripts used to provision the container that will run the application. While it is true that some such scripts may be available by default, this doesn’t imply that the PaaS provides only that. If it’s a fit, use it. If not, create new provisioning scripts.
  • Q. What database products/versions are available?
    A. Any database product or version can be used. If the datastore services available that are associated with the PaaS by default are not sufficient, bring your own or create another application component to accommodate your needs.
  • Q. How can I access the server directly?
    A. There is no “the server” If you want to know more about the server environment, look at the script/code that is responsible for provisioning it. Even better, create a new container and play around with it. Once you get things just right, update your code so that every new container incorporates the desired changes. Every “the server” will look exactly how you define it.
Software Engineering

Overview of CloudFoundry

CloudFoundry is an opensource Platform as a Service (PaaS) technology originally introduced and commercially supported by Pivotal. The software makes it possible to very easily stage, deploy and scale applications, thanks in part to its adoption of buildpacks which were originally introduced by Heroku.

Some software design principles are required to achieve scale with cloud foundry. The most notable design choice is a complete abstraction of persistence, including filesystem, datastore and even in memory cache. This is because instances of an application are transient and stateless. Since this is generally good design anyway, many applications may find it easy to migrate to CloudFoundry.

CloudFoundry Internals

CloudFoundry can be viewed from two perspectives: CloudFoundry internals and Application Developers who want to deploy on CloudFoundry. The image directly below is from the CloudFoundry architecture overview.

cf_architecture_block

This shows the internal components that make up CloudFoundry. The router is the internet facing component that matches requests up with application instances. It performs load balancing among instances of an application. Unlike a load balancer, there is no concept of sticky sessions, since application instances are assumed to be stateless.

The DEA is a compute resource (usually a virtual server, but it can be any compute resource, including bare metal). CloudFoundry uses a technology called Warden for containerization. Other distributions use alternative technologies, like docker.

Services, such as database, cache, filesystem, etc. must implement a Service Broker API. Through this API, CloudFoundry is able to discover, provision and facilitate communication of credentials to each instance of an application.

Application Development

Application Developers interact with CloudFoundry in a few different ways. Two common methods include the command line client and Eclipse plugin. Using these tools, developers may login to a CloudFoundry installation and deploy apps under organizations and spaces. The following diagram illustrates this.

cloudfoundry-diagram

When a developer is ready to deploy, he pushes his app. CloudFoundry then identifies a suitable buildpack and stages the application resulting in a droplet. A droplet is a compressed tar file which contains all the application files and runtime dependencies. After staging, app instances are created and the droplet is extracted on the new instance at which point the application is started. If services are required, these are provisioned when the application is pushed.

Distributions

There are many contributors to open source CloudFoundry. This has resulted in various distributions of CloudFoundry aside from Pivotal’s commercial offering.

ActiveState Stackato

ActiveState distributes CloudFoundry under the brand name Stackato. Some notable differences include the use of docker for instances and a web interface that includes an app store for deploying common applications.

HP Helion Development Platform

Hewlett Packard then offers an enterprise focused distribution of Stackato as HP Helion Development Platform. The enterprise focus includes an emphasis on the ability to use private cloud, public cloud and traditional IT to cost effectively, securely and reliably deploy and scale mission critical applications.

Getting started with CloudFoundry

It’s easy to get started with CloudFoundry. Here are a couple of tutorials that will get you ready to quickly deploy apps.

CloudFoundry on HPCloud.com
Install CloudFoundry in VirtualBox