Tag Archives: containers

The Power of Hypervisor-Based Containers

The modern trend towards cloud-native apps seems to be set to kill hypervisors with a long slow death. Paradoxically, it is the massive success of hypervisors and infrastructure-as-a-service during the last 15 years that enabled this trend.

Stefano Stabellini provides an overview of the rise of containers and how hypervisors are co-existing and thriving in the era of containers. Read more here.

PV Calls: a new paravirtualized protocol for POSIX syscalls

Let’s take a step back and look at the current state of virtualization in the software industry. X86 hypervisors were built to run a few different operating systems on the same machine. Nowadays they are mostly used to execute several instances of the same OS (Linux), each running a single server application in isolation. Containers are a better fit for this use case, but they expose a very large attack surface. It is possible to reduce the attack surface, however it is a very difficult task, one that requires minute knowledge of the app running inside. At any scale it becomes a formidable challenge. The 15-year-old hypervisor technologies, principally designed for RHEL 5 and Windows XP, are more a workaround than a solution for this use case. We need to bring them to the present and take them into the future by modernizing their design.

The typical workload we need to support is a Linux server application which is packaged to be self contained, complying to the OCI Image Format or Docker Image Specification. The app comes with all required userspace dependencies, including its own libc. It makes syscalls to the Linux kernel to access resources and functionalities. This is the only interface we must support.

Many of these syscalls closely correspond to function calls which are part of the POSIX family of standards. They have well known parameters and return values. POSIX stands for “Portable Operating System Interface”: it defines an API available on all major Unixes today, including Linux. POSIX is large to begin with and Linux adds its own set of non-standard calls on top of it. As a result a Linux system has a very high number of exposed calls and, inescapably, also a high number of vulnerabilities. It is wise to restrict syscalls by default. Linux containers struggle with it, but hypervisors are very accomplished in this respect. After all hypervisors don’t need to have full POSIX compatibility. By paravirtualizing hardware interfaces, Xen provides powerful functionalities with a small attack surface. But PV devices are the wrong abstraction layer for Docker apps. They cause duplication of functionalities between the guest and the host. For example, the network stack is traversed twice, first in DomU then in Dom0. This is unnecessary. It is better to raise hypervisor abstractions by paravirtualizing a small set of syscalls directly.

PV Calls

It is far easier and more efficient to write paravirtualized drivers for syscalls than to emulate hardware because syscalls are at a higher level and made for software. I wrote a protocol specification called PV Calls to forward POSIX calls from DomU to Dom0. I also wrote a couple of prototype Linux drivers for it that work at the syscall level. The initial set of calls covers socket, connect, accept, listen, recvmsg, sendmsg and poll. The frontend driver forwards syscalls requests over a ring. The backend implements the syscalls, then returns success or failure to the caller. The protocol creates a new ring for each active socket. The ring size is configurable on a per socket basis. Receiving data is copied to the ring by the backend, while sending data is copied to the ring by the frontend. An event channel per ring is used to notify the other end of any activity. This tiny set of PV Calls is enough to provide networking capabilities to guests.

We are still running virtual machines, but mainly to restrict the vast majority of applications syscalls to a safe and isolated environment. The guest operating system kernel, which is provided by the infrastructure (it doesn’t come with the app), implements syscalls for the benefit of the server application. Xen gives us the means to exploit hardware virtualization extensions to create strong security boundaries around the application. Xen PV VMs enable this approach to work even when virtualization extensions are not available, such as on top of Amazon EC2 or Google Compute Engine instances.

This solution is as secure as Xen VMs but efficiently tailored for containers workloads. Early measurements show excellent performance. It also provides a couple of less obvious advantages. In Docker’s default networking model, containers’ communications appear to be made from the host IP address and containers’ listening ports are explicitly bound to the host. PV Calls are a perfect match for it: outgoing communications are made from the host IP address directly and listening ports are automatically bound to it. No additional configurations are required.

Another benefit is ease of monitoring. One of the key aspects of hardening Linux containers is keeping applications under constant observation with logging and monitoring. We should not ignore it even though Xen provides a safer environment by default. PV Calls forward networking calls made by the application to Dom0. In Dom0 we can trivially log them and detect misbehavior. More powerful (and expensive) monitoring techniques like memory introspection offer further opportunities for malware detection.

PV Calls are unobtrusive. No changes to Xen are required as the existing interfaces are enough. Changes to Linux are very limited as the drivers are self-contained. Moreover, PV Calls perform extremely well! Let’s take a look at a couple of iperf graphs (higher is better):

iperf client

iperf server

The first graph shows network bandwidth measured by running an iperf server in Dom0 and an iperf client inside the VM (or container in the case of Docker). PV Calls reach 75 gbit/sec with 4 threads, far better than netfront/netback.

The second graph shows network bandwidth measured by running an iperf server in the guest (or container in the case of Docker) and an iperf client in Dom0. In this scenario PV Calls reach 55 gbit/sec and outperform not just netfront/netback but even Docker.

The benchmarks have been run on an Intel Xeon D-1540 machine, with 8 cores (16 threads) and 32 GB of ram. Xen is 4.7.0-rc3 and Linux is 4.6-rc2. Dom0 and DomU have 4 vcpus each, pinned. DomU has 4 GB of ram.

For more information on PV Calls, read the full protocol specification on xen-devel. You are welcome to join us and participate in the review discussions. Contributions to the project are very appreciated!


Why Unikernels Can Improve Internet Security

This is a reprint of a 3-part unikernel series published on Linux.com. In this post, Xen Project Advisory Board Chairman Lars Kurth explains how unikernels address security and allow for the careful management of particularly critical portions of an organization’s data and processing needs. (See part one, 7 Unikernel Projects to Take On Docker in 2015.)

Many industries are rapidly moving toward networked, scale-out designs with new and varying workloads and data types. Yet, pick any industry —  retail, banking, health care, social networking or entertainment —  and you’ll find security risks and vulnerabilities are highly problematic, costly and dangerous.

Adam Wick, creator of the The Haskell Lightweight Virtual Machine (HaLVM) and a research lead at Galois Inc., which counts the U.S. Department of Defense and DARPA as clients, says 2015 is already turning out to be a break-out year for security.

“Cloud computing has been a hot topic for several years now, and we’ve seen a wealth of projects and technologies that take advantage of the flexibility the cloud offers,” said Wick. “At the same time though, we’ve seen record-breaking security breach after record-breaking security breach.”

The names are more evocative and well-known thanks to online news and social media, but low-level bugs have always plagued network services, Wick said. So, why is security more important today than ever before?

Improving Security

The creator of MirageOS, Anil Madhavapeddy, says it’s “simply irresponsible to continue to knowingly provision code that is potentially unsafe, and especially so as we head into a year full of promise about smart cities and ubiquitous Internet of Things. We wouldn’t build a bridge on top of quicksand, and should treat our online infrastructure with the same level of respect and attention as we give our physical structures.”

In the hopes of improving security, performance and scalability, there’s a flurry of interesting work taking place around blocking out functionality into containers and lighter-weight unikernel alternatives. Galois, which specializes in R&D for new technologies, says enterprises are increasingly interested in the ability to cleanly separate functionality to limit the effect of a breach to just the component affected, rather than infecting the whole system.

For next-generation clouds and in-house clouds, unikernels make it possible to run thousands of small VMs per host. Galois, for example, uses this capability in their CyberChaff project, which uses minimal VMs to improve intrusion detection on sensitive networks, while others have used similar mechanisms to save considerable cost in hardware, electricity, and cooling; all while reducing the attack surface exposed to malicious hackers. These are welcome developments for anyone concerned with system and network security and help to explain why traditional hypervisors will remain relevant for a wide range of customers well into the future.

Madhavapeddy goes as far to say that certain unikernel architectures would have directly tackled last year’s Heartbleed and Shellshock bugs.

“For example, end-to-end memory safety prevents Heartbleed-style attacks in MirageOS and the HaLVM. And an emphasis on compile-time specialization eliminates complex runtime code such as Unix shells from the images that are deployed onto the cloud,” he said.

The MirageOS team has also put their stack to the test by releasing a “Bitcoin pinata,” which is a unikernel that guards a collection of Bitcoins.  The Bitcoins can only be claimed by breaking through the unikernel security (for example, by compromising the SSL/TLS stack) and then moving the coins.  If the Bitcoins are indeed transferred away, then the public transaction record will reflect that there is a security hole to be fixed.  The contest has been running since February 2015 and the Bitcoins have not yet been taken.


Linux container vs. unikernel security

Linux, as well as Linux containers and Docker images, rely on a fairly heavyweight core OS to provide critical services. Because of this, a vulnerability in the Linux kernel affects every Linux container, Wick said. Instead, using an approach similar to a la carte menus, unikernels only include the minimal functionality and systems needed to run an application or service, all of which makes writing an exploit to attack them much more difficult.

Cloudius Systems, which is running a private beta of OSv, which it tags as the operating system for the cloud, recognizes that progress is being made on this front.

“Rocket is indeed an improvement over Docker, but containers aren’t a multi-tenant solution by design,” said CEO Dor Laor. “No matter how many SELinux Linux policies you throw on containers, the attack surface will still span all aspects of the kernel.”

Martin Lucina, who is working on the Rump Kernel software stack, which enables running existing unmodified POSIX software without an operating system on various platforms, including bare metal embedded systems and unikernels on Xen, explains that unikernels running on the Xen Project hypervisor benefit from the strong isolation guarantees of hardware virtualization and a trusted computing base that is orders of magnitude smaller than that of container technologies.

“There is no shell, you cannot exec() a new process, and in some cases you don’t even need to include a full TCP stack. So there is very little exploit code can do to gain a permanent foothold in the system,” Lucina said.

The key takeaway for organizations worried about security is that they should treat their infrastructure in a less monolithic way. Unikernels allow for the careful management of particularly critical portions of an organization’s data and processing needs. While it does take some extra work, it’s getting easier every day as more developers work on solving challenges with orchestration, logging and monitoring. This means unikernels are coming of age just as many developers are getting serious about security as they begin to build scale-out, distributed systems.

For those interested in learning more about unikernels, the entire series is available as a white paper titled “The Next Generation Cloud: The Rise of the Unikernel.”

Read part 1: 7 Unikernel Projects to Take On Docker in 2015

Xen Users Commonly Asked Questions on Wiki

Xen Community:

I have ported the xen-users mailing list commonly asked questions guide from a pdf to an editable wiki page at http://wiki.xensource.com/xenwiki/XenUsersQuestions. I update this information monthly based on the previous months questions on the xen-users mailing list. As this is now a wiki document, it will be easier for update by the community.

The Wiki page has all the questions broken down into the following groupings to simplify searching:

  • Guest Related Questions
  • Installation Questions
  • Networking Questions
  • High Availability Questions
  • Performance Questions
  • Security Questions
  • Design/Misc Questions

Now where are my domUs?

It’s a question many will ask at some point. You’ve got Xen set up, used a graphical tool to configure some domUs (or downloaded some pre-built images, or followed a howto). But now you want to know where your virtual machines are actually stored. It’s a good question – and it has a slightly complicated answer.

Continue reading