Tag Archives: Docker

PV Calls: a new paravirtualized protocol for POSIX syscalls

Let’s take a step back and look at the current state of virtualization in the software industry. X86 hypervisors were built to run a few different operating systems on the same machine. Nowadays they are mostly used to execute several instances of the same OS (Linux), each running a single server application in isolation. Containers are a better fit for this use case, but they expose a very large attack surface. It is possible to reduce the attack surface, however it is a very difficult task, one that requires minute knowledge of the app running inside. At any scale it becomes a formidable challenge. The 15-year-old hypervisor technologies, principally designed for RHEL 5 and Windows XP, are more a workaround than a solution for this use case. We need to bring them to the present and take them into the future by modernizing their design.

The typical workload we need to support is a Linux server application which is packaged to be self contained, complying to the OCI Image Format or Docker Image Specification. The app comes with all required userspace dependencies, including its own libc. It makes syscalls to the Linux kernel to access resources and functionalities. This is the only interface we must support.

Many of these syscalls closely correspond to function calls which are part of the POSIX family of standards. They have well known parameters and return values. POSIX stands for “Portable Operating System Interface”: it defines an API available on all major Unixes today, including Linux. POSIX is large to begin with and Linux adds its own set of non-standard calls on top of it. As a result a Linux system has a very high number of exposed calls and, inescapably, also a high number of vulnerabilities. It is wise to restrict syscalls by default. Linux containers struggle with it, but hypervisors are very accomplished in this respect. After all hypervisors don’t need to have full POSIX compatibility. By paravirtualizing hardware interfaces, Xen provides powerful functionalities with a small attack surface. But PV devices are the wrong abstraction layer for Docker apps. They cause duplication of functionalities between the guest and the host. For example, the network stack is traversed twice, first in DomU then in Dom0. This is unnecessary. It is better to raise hypervisor abstractions by paravirtualizing a small set of syscalls directly.

PV Calls

It is far easier and more efficient to write paravirtualized drivers for syscalls than to emulate hardware because syscalls are at a higher level and made for software. I wrote a protocol specification called PV Calls to forward POSIX calls from DomU to Dom0. I also wrote a couple of prototype Linux drivers for it that work at the syscall level. The initial set of calls covers socket, connect, accept, listen, recvmsg, sendmsg and poll. The frontend driver forwards syscalls requests over a ring. The backend implements the syscalls, then returns success or failure to the caller. The protocol creates a new ring for each active socket. The ring size is configurable on a per socket basis. Receiving data is copied to the ring by the backend, while sending data is copied to the ring by the frontend. An event channel per ring is used to notify the other end of any activity. This tiny set of PV Calls is enough to provide networking capabilities to guests.

We are still running virtual machines, but mainly to restrict the vast majority of applications syscalls to a safe and isolated environment. The guest operating system kernel, which is provided by the infrastructure (it doesn’t come with the app), implements syscalls for the benefit of the server application. Xen gives us the means to exploit hardware virtualization extensions to create strong security boundaries around the application. Xen PV VMs enable this approach to work even when virtualization extensions are not available, such as on top of Amazon EC2 or Google Compute Engine instances.

This solution is as secure as Xen VMs but efficiently tailored for containers workloads. Early measurements show excellent performance. It also provides a couple of less obvious advantages. In Docker’s default networking model, containers’ communications appear to be made from the host IP address and containers’ listening ports are explicitly bound to the host. PV Calls are a perfect match for it: outgoing communications are made from the host IP address directly and listening ports are automatically bound to it. No additional configurations are required.

Another benefit is ease of monitoring. One of the key aspects of hardening Linux containers is keeping applications under constant observation with logging and monitoring. We should not ignore it even though Xen provides a safer environment by default. PV Calls forward networking calls made by the application to Dom0. In Dom0 we can trivially log them and detect misbehavior. More powerful (and expensive) monitoring techniques like memory introspection offer further opportunities for malware detection.

PV Calls are unobtrusive. No changes to Xen are required as the existing interfaces are enough. Changes to Linux are very limited as the drivers are self-contained. Moreover, PV Calls perform extremely well! Let’s take a look at a couple of iperf graphs (higher is better):

iperf client

iperf server

The first graph shows network bandwidth measured by running an iperf server in Dom0 and an iperf client inside the VM (or container in the case of Docker). PV Calls reach 75 gbit/sec with 4 threads, far better than netfront/netback.

The second graph shows network bandwidth measured by running an iperf server in the guest (or container in the case of Docker) and an iperf client in Dom0. In this scenario PV Calls reach 55 gbit/sec and outperform not just netfront/netback but even Docker.

The benchmarks have been run on an Intel Xeon D-1540 machine, with 8 cores (16 threads) and 32 GB of ram. Xen is 4.7.0-rc3 and Linux is 4.6-rc2. Dom0 and DomU have 4 vcpus each, pinned. DomU has 4 GB of ram.

For more information on PV Calls, read the full protocol specification on xen-devel. You are welcome to join us and participate in the review discussions. Contributions to the project are very appreciated!

Will Docker Replace Virtual Machines?

Docker is certainly the most influential open source project of the moment. Why is Docker so successful? Is it going to replace Virtual Machines? Will there be a big switch? If so, when?

Let’s look at the past to understand the present and predict the future. Before virtual machines, system administrators used to provision physical boxes to their users. The process was cumbersome, not completely automated, and it took hours if not days. When something went wrong, they had to run to the server room to replace the physical box.

With the advent of virtual machines, DevOps could install any hypervisor on all their boxes, then they could simply provision new virtual machines upon request from their users. Provisioning a VM took minutes instead of hours and could be automated. The underlying hardware made less of a difference and was mostly commoditized. If one needed more resources, it would just create a new VM. If a physical machine broke, the admin just migrated or resumed her VMs onto a different host.

Finer-grained deployment models became viable and convenient. Users were not forced to run all their applications on the same box anymore, to exploit the underlying hardware capabilities to the fullest. One could run a VM with the database, another with middleware and a third with the webserver without worrying about hardware utilization. The people buying the hardware and the people architecting the software stack could work independently in the same company, without interference. The new interface between the two teams had become the virtual machine. Solution architects could cheaply deploy each application on a different VM, reducing their maintenance costs significantly. Software engineers loved it. This might have been the biggest innovation introduced by hypervisors.

A few years passed and everybody in the business got accustomed to working with virtual machines. Startups don’t even buy server hardware anymore, they just shop on Amazon AWS. One virtual machine per application is the standard way to deploy software stacks.

Application deployment hasn’t changed much since the ’90s though. Up until then, it still involved installing a Linux distro, mostly built for physical hardware, installing the required deb or rpm packages, and finally installing and configuring the application that one actually wanted to run.

In 2013 Docker came out with a simple, yet effective tool to create, distribute and deploy applications wrapped in a nice format to run in independent Linux containers. It comes with a registry that is like an app store for these applications, which I’ll call “cloud apps” for clarity. Deploying the Nginx webserver had just become one “docker pull nginx” away. This is much quicker and simpler than installing the latest Ubuntu LTS. Docker cloud apps come preconfigured and without any unnecessary packages that are unavoidably installed by Linux distros. In fact the Nginx Docker cloud app is produced and distributed by the Nginx community directly, rather than Canonical or Red Hat.

Docker’s outstanding innovations are the introduction of a standard format for cloud applications, including the registry. Instead of using VMs to run cloud apps, Linux containers are used instead. Containers had been available for years, but they weren’t quite popular outside Google and few other circles. Although they offer very good performance, they have fewer features and weaker isolation compared to virtual machines. As a rising star, Docker made Linux containers suddenly popular, but containers were not the reason behind Docker’s success. It was incidental.

What is the problem with containers? Their live-migration support is still very green and they cannot run non-native workloads (Windows on Linux or Linux on Windows). Furthermore, the primary challenge with containers is security: the surface of attack is far larger compared to virtual machines. In fact, multi-tenant container deployments are strongly discouraged by Docker, CoreOS, and anybody else in the industry. With virtual machines you don’t have to worry about who is going to use it or how it will be used. On the other hand, only containers that belong to the same user should be run on the same host. Amazon and Google offer container hosting, but they both run each container on top of a separate virtual machine for isolation and security. Maybe inefficient but certainly simple and effective.

People are starting to notice this. At the beginning of the year a few high profile projects launched to bring the benefits of virtual machines to Docker, in particular Clear Linux by Intel and Hyper. Both of them use conventional virtual machines to run Docker cloud applications directly (no Linux containers are involved). We did a few tests with Xen: tuning the hypervisor for this use case allowed us to reach the same startup times offered by Linux containers, retaining all the other features. A similar effort by Intel for Xen is being presented at the Xen Developer Summit and Hyper is also presenting their work.

This new direction has the potential to deliver the best of both worlds to our users: the convenience of Docker with the security of virtual machines. Soon Docker might not be fighting virtual machines at all, Docker could be the one deploying them.

A Chinese translation of the article is available here: http://dockone.io/article/598

New Hyper Open Source Project Allows Developers To Leverage Docker and Xen Virtualization Infrastructure

Docker’s popularity and usefulness in cloud systems architectures is evident, having won over countless developers. Yet, it’s not a replacement for mature, proven and security-hardened virtualization technologies that support many of the world’s largest clouds in production.

So, while developers clearly want to take advantage of container technology to easily package applications, they also need a seamless migration path to their existing virtual infrastructure. That’s where our new partner Hyper announced today comes into play.

Hyper, a Chinese-based company with a new open source project by the same name, allows developers to run Docker images with Xen Project virtualization, version 4.5 or later. Download available here.

“Hyper offers the best of both worlds — VMs and containers,” said Xu Wang, Co-Founder at Hyper. “Our technology allows enterprises to leverage any mature, implemented virtualization infrastructure and eliminate unwanted complexity and also take advantage of container technology to easily package applications. We are partnering with Xen more closely to help developers get more out of their hypervisor, while also enjoying the benefits of container technology.”

To learn more, be sure to check out the presentation “Hyper: Make VM Run Like Containers” at Xen Project Developer Summit, Aug. 17-18. You’ll also find them at The Linux Foundation’s new ContainerCon event, as a bronze sponsor.

Hyper Enables the Next-Generation Container-as-a-Service

Caas (Container-as-a-Service) is gaining traction in cloud computing by leveraging the portability of Docker to avoid various technical limitations in a Platform-as-a-Service. However, the shared kernel approach introduces unnecessary complexity, overcapacity and security insecurity.

To eliminate these problems, Hyper uses virtualization to achieve hardware-enforced isolation. Unlike a VM + container approach, Hyper does not employ a GuestOS in the VM instance. Instead a HyperKernel, a customized Linux Kernel which includes Docker functionality, is loaded to host the Docker images. Hyper guests also does not require any Linux Container technology: in other words in Hyper guests do not require LXC, cgroups, namespace and  Docker daemon to run; they only require MOUNT namespace to support pods of Docker images.

Screen Shot 2015-07-20 at 3.47.49 PM (1)

 

To learn more about this minimalist approach, which also offers sub-second boot, rapid ROI, enhanced security, minimal resource footprint and overheads and more, check out these additional resources:

Continuous innovation is the lifeblood of any project, and Xen Project is fortunate to have an extremely active and growing community. Partners like Hyper allow Xen Project to stay one step ahead of the industry with security, performance and scalability as cloud and computing infrastructures evolve.

PIÑATA

Why Unikernels Can Improve Internet Security

This is a reprint of a 3-part unikernel series published on Linux.com. In this post, Xen Project Advisory Board Chairman Lars Kurth explains how unikernels address security and allow for the careful management of particularly critical portions of an organization’s data and processing needs. (See part one, 7 Unikernel Projects to Take On Docker in 2015.)

Many industries are rapidly moving toward networked, scale-out designs with new and varying workloads and data types. Yet, pick any industry —  retail, banking, health care, social networking or entertainment —  and you’ll find security risks and vulnerabilities are highly problematic, costly and dangerous.

Adam Wick, creator of the The Haskell Lightweight Virtual Machine (HaLVM) and a research lead at Galois Inc., which counts the U.S. Department of Defense and DARPA as clients, says 2015 is already turning out to be a break-out year for security.

“Cloud computing has been a hot topic for several years now, and we’ve seen a wealth of projects and technologies that take advantage of the flexibility the cloud offers,” said Wick. “At the same time though, we’ve seen record-breaking security breach after record-breaking security breach.”

The names are more evocative and well-known thanks to online news and social media, but low-level bugs have always plagued network services, Wick said. So, why is security more important today than ever before?

Improving Security

The creator of MirageOS, Anil Madhavapeddy, says it’s “simply irresponsible to continue to knowingly provision code that is potentially unsafe, and especially so as we head into a year full of promise about smart cities and ubiquitous Internet of Things. We wouldn’t build a bridge on top of quicksand, and should treat our online infrastructure with the same level of respect and attention as we give our physical structures.”

In the hopes of improving security, performance and scalability, there’s a flurry of interesting work taking place around blocking out functionality into containers and lighter-weight unikernel alternatives. Galois, which specializes in R&D for new technologies, says enterprises are increasingly interested in the ability to cleanly separate functionality to limit the effect of a breach to just the component affected, rather than infecting the whole system.

For next-generation clouds and in-house clouds, unikernels make it possible to run thousands of small VMs per host. Galois, for example, uses this capability in their CyberChaff project, which uses minimal VMs to improve intrusion detection on sensitive networks, while others have used similar mechanisms to save considerable cost in hardware, electricity, and cooling; all while reducing the attack surface exposed to malicious hackers. These are welcome developments for anyone concerned with system and network security and help to explain why traditional hypervisors will remain relevant for a wide range of customers well into the future.

Madhavapeddy goes as far to say that certain unikernel architectures would have directly tackled last year’s Heartbleed and Shellshock bugs.

“For example, end-to-end memory safety prevents Heartbleed-style attacks in MirageOS and the HaLVM. And an emphasis on compile-time specialization eliminates complex runtime code such as Unix shells from the images that are deployed onto the cloud,” he said.

The MirageOS team has also put their stack to the test by releasing a “Bitcoin pinata,” which is a unikernel that guards a collection of Bitcoins.  The Bitcoins can only be claimed by breaking through the unikernel security (for example, by compromising the SSL/TLS stack) and then moving the coins.  If the Bitcoins are indeed transferred away, then the public transaction record will reflect that there is a security hole to be fixed.  The contest has been running since February 2015 and the Bitcoins have not yet been taken.

PIÑATA

Linux container vs. unikernel security

Linux, as well as Linux containers and Docker images, rely on a fairly heavyweight core OS to provide critical services. Because of this, a vulnerability in the Linux kernel affects every Linux container, Wick said. Instead, using an approach similar to a la carte menus, unikernels only include the minimal functionality and systems needed to run an application or service, all of which makes writing an exploit to attack them much more difficult.

Cloudius Systems, which is running a private beta of OSv, which it tags as the operating system for the cloud, recognizes that progress is being made on this front.

“Rocket is indeed an improvement over Docker, but containers aren’t a multi-tenant solution by design,” said CEO Dor Laor. “No matter how many SELinux Linux policies you throw on containers, the attack surface will still span all aspects of the kernel.”

Martin Lucina, who is working on the Rump Kernel software stack, which enables running existing unmodified POSIX software without an operating system on various platforms, including bare metal embedded systems and unikernels on Xen, explains that unikernels running on the Xen Project hypervisor benefit from the strong isolation guarantees of hardware virtualization and a trusted computing base that is orders of magnitude smaller than that of container technologies.

“There is no shell, you cannot exec() a new process, and in some cases you don’t even need to include a full TCP stack. So there is very little exploit code can do to gain a permanent foothold in the system,” Lucina said.

The key takeaway for organizations worried about security is that they should treat their infrastructure in a less monolithic way. Unikernels allow for the careful management of particularly critical portions of an organization’s data and processing needs. While it does take some extra work, it’s getting easier every day as more developers work on solving challenges with orchestration, logging and monitoring. This means unikernels are coming of age just as many developers are getting serious about security as they begin to build scale-out, distributed systems.

For those interested in learning more about unikernels, the entire series is available as a white paper titled “The Next Generation Cloud: The Rise of the Unikernel.”

Read part 1: 7 Unikernel Projects to Take On Docker in 2015

7 Unikernel Projects to Take On Docker in 2015

This is a reprint of a 3-part unikernel series published on Linux.com. In part one, Xen Project Advisory Board Chairman Lars Kurth takes a closer look at the rise of unikernels and several up-and-coming projects to keep close tabs on in the coming months.

Docker and Linux container technologies dominate headlines today as a powerful, easy way to package applications, especially as cloud computing becomes more mainstream. While still a work-in-progress, they offer a simple, clean and lean way to distribute application workloads.

With enthusiasm continuing to grow for container innovations, a related technology called unikernels is also beginning to attract attention. Known also for their ability to cleanly separate functionality at the component level, unikernels are developing a variety of new approaches to deploy cloud services.

Traditional operating systems run multiple applications on a single machine, managing resources and isolating applications from one another.  A unikernel runs a single application on a single virtual machine, relying instead on the hypervisor to isolate those virtual machines. Unikernels are constructed by using “library operating systems,” from which the developer selects only the minimal set of services required for an application to run. These sealed, fixed-purpose images run directly on a hypervisor without an intervening guest OS such as Linux.

unikernel illustration
Image credit: Xen Project.

 

As well as improving upon container technologies, unikernels are also able to deliver impressive flexibility, speed and versatility for cross-platform environments, big data analytics and scale-out cloud computing. Like container-based solutions, this technology fulfills the promise of easy deployment, but unikernels also offer an extremely tiny, specialized runtime footprint that is much less vulnerable to attack.

There are several up-and-coming open source projects to watch this year, including ClickOSClive,HaLVMLINGMirageOSRump Kernels and OSv among others, with each of them placing emphasis on a different aspect of the unikernel approach.  For example, MirageOS and HaLVM take a clean-slate approach and focus on safety and security, ClickOS emphasizes speed, while OSv and Rump kernels aim for compatibility with legacy software. Such flexible approaches are not possible with existing monolithic operating systems, which have decades of assumptions and trade-offs baked into them.

How are unikernels able to deliver better security? How do the various unikernel implementations differ in their approach? Who is using the technology today? What are the key benefits to cloud and data center operators? Will unikernels on hypervisors replace containers, or will enterprises use a mix of all three? If so, how and why?  Answers to these questions and insights from the key developers behind these exciting new projects will be covered in parts two and three of this series.

For those interested in learning more about unikernels, the entire series is available as a white paper titled “The Next Generation Cloud: The Rise of the Unikernel.”