Tag Archives: containers

Unikraft: Unleashing the Power of Unikernels

This blog post was written by Dr. Felipe Huici, Chief Researcher, Systems and Machine Learning Group, at NEC Laboratories Europe

 The team at NEC Laboratories Europe spent quite a bit of time over the last few years developing unikernels – specialized virtual machine images targeting specific applications. This technology is fascinating to us because of its fantastic performance benefits: tiny memory footprints (hundreds of KBs or a few MBs), boot times compared to those of processes or throughput in the range of 10-40 Gb/s, among many other attributes. Specific metrics can be found in these articles: “My VM is Lighter (and Safer) than your Container,” “Unikernels Everywhere: The Case for Elastic CDNs,” and “ClickOS and the Art of Network Function Virtualization.”

The potential of unikernels is great (as you can see from the work above), but there hasn’t been a massive adoption of unikernels. Why? Development time.  For example, developing Minipython, a MicroPython unikernel, took the better part of three months to put together and test. ClickOS, a unikernel for NFV, was the result of a couple of years of work.

What’s particularly bad about this development model, besides the considerable time spent, is each unikernel is basically a “throwaway.” Every time we want to create a new unikernel targeting a different application, developers have to start from scratch. Essentially, there is a lack of shared research and development when it comes to building unikernels.

We (at NEC) wanted to change this, so we started to re-use the work and created a separate repo consisting of a “toolstack” that would contain functionality useful to multiple unikernels — mostly platform-independent versions of newlib and lwip (a C library and network stack intended for embedded systems).

This got us thinking that we should take our work to a much bigger level. We asked the question: Wouldn’t it be great to be able to very quickly choose, perhaps from a menu, the bits of functionality that we want for an unikernel, and to have a system automatically build all of these pieces together into a working image? It would also be great if we could choose multiple platforms (e.g., Xen, KVM, bare metal) without having to do additional work for each of them.

The result of that thought process is Unikraft. Unikraft decomposes operating systems into elementary pieces called libraries (e.g., schedulers, memory allocators, drivers, filesystems, network stacks, etc.) that users can then pick and choose from, using a menu to quickly build images tailored to the needs of specific applications. In greater detail, Unikraft consists of two basic components (see Figure 1):

  • Library pools contain libraries that the user of Unikraft can select from to create the unikernel. From the bottom up, library pools are organized into (1) the architecture library tool, containing libraries specific to a computer architecture (e.g., x86_64, ARM32 or MIPS); (2) the platform tool, where target platforms can be Xen, KVM, bare metal (i.e. no virtualization), user-space Linux and potentially even containers; and (3) the main library pool, containing a rich set of functionality to build the unikernel. This last library includes drivers (both virtual such as netback/netfront and physical such as ixgbe), filesystems, memory allocators, schedulers, network stacks, standard libs (e.g. libc, openssl, etc.), and runtimes (e.g. a Python interpreter and debugging and profiling tools). These pools of libraries constitute a codebase for creating unikernels. As shown, a library can be relatively large (e.g libc) or quite small (a scheduler), which allows for customization for the unikernel.
  • The Unikraft build tool is in charge of compiling the application and the selected libraries together to create a binary for a specific platform and architecture (e.g., Xen on x86_64). The tool is currently inspired by Linux’s KCONFIG system and consists of a set of Makefiles. It allows users to select libraries, to configure them, and to warn them when library dependencies are not met. In addition, the tool can also simultaneously generate binaries for multiple platforms.

unikraft

Figure 1. Unikraft architecture.

Getting Involved
We are very excited about the recent open source release of Unikraft as a Xen Project Foundation incubator project.  The Xen Project is a part of the Linux Foundation umbrella. We welcome developers willing to help improve Unikraft. Whether you’re interested in particular applications, programming languages, platforms, architectures or OS primitive. We are more than happy to build and receive contributions from the community. To get you started, here are a number of available resources:

Please don’t be shy about getting in touch with us, we would be more than happy to answer any questions you may have. You can reach the core Unikraft development team at sysml@listserv.neclab.eu .

The Power of Hypervisor-Based Containers

The modern trend towards cloud-native apps seems to be set to kill hypervisors with a long slow death. Paradoxically, it is the massive success of hypervisors and infrastructure-as-a-service during the last 15 years that enabled this trend.

Stefano Stabellini provides an overview of the rise of containers and how hypervisors are co-existing and thriving in the era of containers. Read more here.

PV Calls: a new paravirtualized protocol for POSIX syscalls

Let’s take a step back and look at the current state of virtualization in the software industry. X86 hypervisors were built to run a few different operating systems on the same machine. Nowadays they are mostly used to execute several instances of the same OS (Linux), each running a single server application in isolation. Containers are a better fit for this use case, but they expose a very large attack surface. It is possible to reduce the attack surface, however it is a very difficult task, one that requires minute knowledge of the app running inside. At any scale it becomes a formidable challenge. The 15-year-old hypervisor technologies, principally designed for RHEL 5 and Windows XP, are more a workaround than a solution for this use case. We need to bring them to the present and take them into the future by modernizing their design.

The typical workload we need to support is a Linux server application which is packaged to be self contained, complying to the OCI Image Format or Docker Image Specification. The app comes with all required userspace dependencies, including its own libc. It makes syscalls to the Linux kernel to access resources and functionalities. This is the only interface we must support.

Many of these syscalls closely correspond to function calls which are part of the POSIX family of standards. They have well known parameters and return values. POSIX stands for “Portable Operating System Interface”: it defines an API available on all major Unixes today, including Linux. POSIX is large to begin with and Linux adds its own set of non-standard calls on top of it. As a result a Linux system has a very high number of exposed calls and, inescapably, also a high number of vulnerabilities. It is wise to restrict syscalls by default. Linux containers struggle with it, but hypervisors are very accomplished in this respect. After all hypervisors don’t need to have full POSIX compatibility. By paravirtualizing hardware interfaces, Xen provides powerful functionalities with a small attack surface. But PV devices are the wrong abstraction layer for Docker apps. They cause duplication of functionalities between the guest and the host. For example, the network stack is traversed twice, first in DomU then in Dom0. This is unnecessary. It is better to raise hypervisor abstractions by paravirtualizing a small set of syscalls directly.

PV Calls

It is far easier and more efficient to write paravirtualized drivers for syscalls than to emulate hardware because syscalls are at a higher level and made for software. I wrote a protocol specification called PV Calls to forward POSIX calls from DomU to Dom0. I also wrote a couple of prototype Linux drivers for it that work at the syscall level. The initial set of calls covers socket, connect, accept, listen, recvmsg, sendmsg and poll. The frontend driver forwards syscalls requests over a ring. The backend implements the syscalls, then returns success or failure to the caller. The protocol creates a new ring for each active socket. The ring size is configurable on a per socket basis. Receiving data is copied to the ring by the backend, while sending data is copied to the ring by the frontend. An event channel per ring is used to notify the other end of any activity. This tiny set of PV Calls is enough to provide networking capabilities to guests.

We are still running virtual machines, but mainly to restrict the vast majority of applications syscalls to a safe and isolated environment. The guest operating system kernel, which is provided by the infrastructure (it doesn’t come with the app), implements syscalls for the benefit of the server application. Xen gives us the means to exploit hardware virtualization extensions to create strong security boundaries around the application. Xen PV VMs enable this approach to work even when virtualization extensions are not available, such as on top of Amazon EC2 or Google Compute Engine instances.

This solution is as secure as Xen VMs but efficiently tailored for containers workloads. Early measurements show excellent performance. It also provides a couple of less obvious advantages. In Docker’s default networking model, containers’ communications appear to be made from the host IP address and containers’ listening ports are explicitly bound to the host. PV Calls are a perfect match for it: outgoing communications are made from the host IP address directly and listening ports are automatically bound to it. No additional configurations are required.

Another benefit is ease of monitoring. One of the key aspects of hardening Linux containers is keeping applications under constant observation with logging and monitoring. We should not ignore it even though Xen provides a safer environment by default. PV Calls forward networking calls made by the application to Dom0. In Dom0 we can trivially log them and detect misbehavior. More powerful (and expensive) monitoring techniques like memory introspection offer further opportunities for malware detection.

PV Calls are unobtrusive. No changes to Xen are required as the existing interfaces are enough. Changes to Linux are very limited as the drivers are self-contained. Moreover, PV Calls perform extremely well! Let’s take a look at a couple of iperf graphs (higher is better):

iperf client

iperf server

The first graph shows network bandwidth measured by running an iperf server in Dom0 and an iperf client inside the VM (or container in the case of Docker). PV Calls reach 75 gbit/sec with 4 threads, far better than netfront/netback.

The second graph shows network bandwidth measured by running an iperf server in the guest (or container in the case of Docker) and an iperf client in Dom0. In this scenario PV Calls reach 55 gbit/sec and outperform not just netfront/netback but even Docker.

The benchmarks have been run on an Intel Xeon D-1540 machine, with 8 cores (16 threads) and 32 GB of ram. Xen is 4.7.0-rc3 and Linux is 4.6-rc2. Dom0 and DomU have 4 vcpus each, pinned. DomU has 4 GB of ram.

For more information on PV Calls, read the full protocol specification on xen-devel. You are welcome to join us and participate in the review discussions. Contributions to the project are very appreciated!

PIÑATA

Why Unikernels Can Improve Internet Security

This is a reprint of a 3-part unikernel series published on Linux.com. In this post, Xen Project Advisory Board Chairman Lars Kurth explains how unikernels address security and allow for the careful management of particularly critical portions of an organization’s data and processing needs. (See part one, 7 Unikernel Projects to Take On Docker in 2015.)

Many industries are rapidly moving toward networked, scale-out designs with new and varying workloads and data types. Yet, pick any industry —  retail, banking, health care, social networking or entertainment —  and you’ll find security risks and vulnerabilities are highly problematic, costly and dangerous.

Adam Wick, creator of the The Haskell Lightweight Virtual Machine (HaLVM) and a research lead at Galois Inc., which counts the U.S. Department of Defense and DARPA as clients, says 2015 is already turning out to be a break-out year for security.

“Cloud computing has been a hot topic for several years now, and we’ve seen a wealth of projects and technologies that take advantage of the flexibility the cloud offers,” said Wick. “At the same time though, we’ve seen record-breaking security breach after record-breaking security breach.”

The names are more evocative and well-known thanks to online news and social media, but low-level bugs have always plagued network services, Wick said. So, why is security more important today than ever before?

Improving Security

The creator of MirageOS, Anil Madhavapeddy, says it’s “simply irresponsible to continue to knowingly provision code that is potentially unsafe, and especially so as we head into a year full of promise about smart cities and ubiquitous Internet of Things. We wouldn’t build a bridge on top of quicksand, and should treat our online infrastructure with the same level of respect and attention as we give our physical structures.”

In the hopes of improving security, performance and scalability, there’s a flurry of interesting work taking place around blocking out functionality into containers and lighter-weight unikernel alternatives. Galois, which specializes in R&D for new technologies, says enterprises are increasingly interested in the ability to cleanly separate functionality to limit the effect of a breach to just the component affected, rather than infecting the whole system.

For next-generation clouds and in-house clouds, unikernels make it possible to run thousands of small VMs per host. Galois, for example, uses this capability in their CyberChaff project, which uses minimal VMs to improve intrusion detection on sensitive networks, while others have used similar mechanisms to save considerable cost in hardware, electricity, and cooling; all while reducing the attack surface exposed to malicious hackers. These are welcome developments for anyone concerned with system and network security and help to explain why traditional hypervisors will remain relevant for a wide range of customers well into the future.

Madhavapeddy goes as far to say that certain unikernel architectures would have directly tackled last year’s Heartbleed and Shellshock bugs.

“For example, end-to-end memory safety prevents Heartbleed-style attacks in MirageOS and the HaLVM. And an emphasis on compile-time specialization eliminates complex runtime code such as Unix shells from the images that are deployed onto the cloud,” he said.

The MirageOS team has also put their stack to the test by releasing a “Bitcoin pinata,” which is a unikernel that guards a collection of Bitcoins.  The Bitcoins can only be claimed by breaking through the unikernel security (for example, by compromising the SSL/TLS stack) and then moving the coins.  If the Bitcoins are indeed transferred away, then the public transaction record will reflect that there is a security hole to be fixed.  The contest has been running since February 2015 and the Bitcoins have not yet been taken.

PIÑATA

Linux container vs. unikernel security

Linux, as well as Linux containers and Docker images, rely on a fairly heavyweight core OS to provide critical services. Because of this, a vulnerability in the Linux kernel affects every Linux container, Wick said. Instead, using an approach similar to a la carte menus, unikernels only include the minimal functionality and systems needed to run an application or service, all of which makes writing an exploit to attack them much more difficult.

Cloudius Systems, which is running a private beta of OSv, which it tags as the operating system for the cloud, recognizes that progress is being made on this front.

“Rocket is indeed an improvement over Docker, but containers aren’t a multi-tenant solution by design,” said CEO Dor Laor. “No matter how many SELinux Linux policies you throw on containers, the attack surface will still span all aspects of the kernel.”

Martin Lucina, who is working on the Rump Kernel software stack, which enables running existing unmodified POSIX software without an operating system on various platforms, including bare metal embedded systems and unikernels on Xen, explains that unikernels running on the Xen Project hypervisor benefit from the strong isolation guarantees of hardware virtualization and a trusted computing base that is orders of magnitude smaller than that of container technologies.

“There is no shell, you cannot exec() a new process, and in some cases you don’t even need to include a full TCP stack. So there is very little exploit code can do to gain a permanent foothold in the system,” Lucina said.

The key takeaway for organizations worried about security is that they should treat their infrastructure in a less monolithic way. Unikernels allow for the careful management of particularly critical portions of an organization’s data and processing needs. While it does take some extra work, it’s getting easier every day as more developers work on solving challenges with orchestration, logging and monitoring. This means unikernels are coming of age just as many developers are getting serious about security as they begin to build scale-out, distributed systems.

For those interested in learning more about unikernels, the entire series is available as a white paper titled “The Next Generation Cloud: The Rise of the Unikernel.”

Read part 1: 7 Unikernel Projects to Take On Docker in 2015

Xen Users Commonly Asked Questions on Wiki

Xen Community:

I have ported the xen-users mailing list commonly asked questions guide from a pdf to an editable wiki page at http://wiki.xensource.com/xenwiki/XenUsersQuestions. I update this information monthly based on the previous months questions on the xen-users mailing list. As this is now a wiki document, it will be easier for update by the community.

The Wiki page has all the questions broken down into the following groupings to simplify searching:

  • Guest Related Questions
  • Installation Questions
  • Networking Questions
  • High Availability Questions
  • Performance Questions
  • Security Questions
  • Design/Misc Questions

Now where are my domUs?

It’s a question many will ask at some point. You’ve got Xen set up, used a graphical tool to configure some domUs (or downloaded some pre-built images, or followed a howto). But now you want to know where your virtual machines are actually stored. It’s a good question – and it has a slightly complicated answer.

Continue reading