Category Archives: Commentary

A community member shares a viewpoint

What You Need to Know about Recent Xen Project Security Advisories

Today the Xen Project announced eight security advisories: XSA-191 to XSA-198. The bulk of these security advisories were discovered and fixed during the hardening phase of the Xen Project Hypervisor 4.8 release (expected to come out in early December). The Xen Project has implemented a security-first approach when publishing new releases.

In order to increase the security of future releases, members of the Xen Project Security Team and key contributors to the Xen Project, actively search and fix security bugs in code areas where vulnerability were found in past releases. The contributors use techniques such as code inspections, static code analysis, and additional testing using fuzzers such as American Fuzzy Lop. These fixes are then backported to older Xen Project releases with security support and published in bulk to make it easier for downstreams consumers to apply security fixes.

Before we declare a new Xen Project feature as supported, we perform a security assessment (see declare the Credit2 scheduler as supported). In addition, the contributors focused on security have started crafting tests for each vulnerability and integrated them into our automated regression testing system run regularly on all maintained versions of Xen. This ensures that the patch will be applied to every version which is vulnerable, and also ensures that no bug is accidentally reintroduced as development continues to go forward.

The Xen Project’s mature and robust security response process is optimized for cloud environments and downstream Xen Project consumers to maximize fairness, effectiveness and transparency. This includes not publicly discussing any details with security implications during our embargo period. This encourages anyone to report bugs they find to the Xen Project Security team, and allows the Xen Project Security team to assess, respond, and prepare patches, before before public disclosure and broad compromise occurs.

During the embargo period, the Xen Project does not publicly discuss any details with security implications except:

  • when co-opting technical assistance from other parties;
  • when issuing a Xen Project Security Advisory (XSA). This is pre-disclosed to only members on the Xen Project Pre-Disclosure List (see www.xenproject.org/security-policy.html); and
  • when necessary to coordinate with other projects affected

The Xen Project security team will assign and publicly release numbers for vulnerabilities. This is the only information that is shared publicly during the embargo period. See this url for “XSA Advisories, Publicly Released or Pre-Released”: xenbits.xen.org/xsa.

Xen’s latest XSA-191, XSA-192, XSA-193, XSA-194, XSA-195, XSA-196, XSA-197 and XSA-198 Advisory can all be found here:
xenbits.xen.org/xsa

Any Xen-based public cloud is eligible to be on our “pre-disclosure” list. Cloud providers on the list were notified of the vulnerability and provided a patch two weeks before the public announcement in order to make sure they all had time to apply the patch to their servers.

Distributions and other major software vendors of Xen Project software were also given the patch in advance to make sure they had updated packages ready to download as soon as the vulnerability was announced. Private clouds and individuals are urged to apply the patch or update their packages as soon as possible.

All of the above XSAs that affect the hypervisor can be deployed using the Xen Project LivePatching functionality, which enables re-boot free deployment of security patches to minimize disruption and downtime during security upgrades for system administrators and DevOps practitioners. The Xen Project encourages its users to download these patches.

More information about the Xen Project’s Security Vulnerability Process, including the embargo and disclosure schedule, policies around embargoed information, information sharing among pre-disclosed list members, a list of pre-disclosure list members, and the application process to join the list, can be found at: www.xenproject.org/security-policy.html

PV Calls: a new paravirtualized protocol for POSIX syscalls

Let’s take a step back and look at the current state of virtualization in the software industry. X86 hypervisors were built to run a few different operating systems on the same machine. Nowadays they are mostly used to execute several instances of the same OS (Linux), each running a single server application in isolation. Containers are a better fit for this use case, but they expose a very large attack surface. It is possible to reduce the attack surface, however it is a very difficult task, one that requires minute knowledge of the app running inside. At any scale it becomes a formidable challenge. The 15-year-old hypervisor technologies, principally designed for RHEL 5 and Windows XP, are more a workaround than a solution for this use case. We need to bring them to the present and take them into the future by modernizing their design.

The typical workload we need to support is a Linux server application which is packaged to be self contained, complying to the OCI Image Format or Docker Image Specification. The app comes with all required userspace dependencies, including its own libc. It makes syscalls to the Linux kernel to access resources and functionalities. This is the only interface we must support.

Many of these syscalls closely correspond to function calls which are part of the POSIX family of standards. They have well known parameters and return values. POSIX stands for “Portable Operating System Interface”: it defines an API available on all major Unixes today, including Linux. POSIX is large to begin with and Linux adds its own set of non-standard calls on top of it. As a result a Linux system has a very high number of exposed calls and, inescapably, also a high number of vulnerabilities. It is wise to restrict syscalls by default. Linux containers struggle with it, but hypervisors are very accomplished in this respect. After all hypervisors don’t need to have full POSIX compatibility. By paravirtualizing hardware interfaces, Xen provides powerful functionalities with a small attack surface. But PV devices are the wrong abstraction layer for Docker apps. They cause duplication of functionalities between the guest and the host. For example, the network stack is traversed twice, first in DomU then in Dom0. This is unnecessary. It is better to raise hypervisor abstractions by paravirtualizing a small set of syscalls directly.

PV Calls

It is far easier and more efficient to write paravirtualized drivers for syscalls than to emulate hardware because syscalls are at a higher level and made for software. I wrote a protocol specification called PV Calls to forward POSIX calls from DomU to Dom0. I also wrote a couple of prototype Linux drivers for it that work at the syscall level. The initial set of calls covers socket, connect, accept, listen, recvmsg, sendmsg and poll. The frontend driver forwards syscalls requests over a ring. The backend implements the syscalls, then returns success or failure to the caller. The protocol creates a new ring for each active socket. The ring size is configurable on a per socket basis. Receiving data is copied to the ring by the backend, while sending data is copied to the ring by the frontend. An event channel per ring is used to notify the other end of any activity. This tiny set of PV Calls is enough to provide networking capabilities to guests.

We are still running virtual machines, but mainly to restrict the vast majority of applications syscalls to a safe and isolated environment. The guest operating system kernel, which is provided by the infrastructure (it doesn’t come with the app), implements syscalls for the benefit of the server application. Xen gives us the means to exploit hardware virtualization extensions to create strong security boundaries around the application. Xen PV VMs enable this approach to work even when virtualization extensions are not available, such as on top of Amazon EC2 or Google Compute Engine instances.

This solution is as secure as Xen VMs but efficiently tailored for containers workloads. Early measurements show excellent performance. It also provides a couple of less obvious advantages. In Docker’s default networking model, containers’ communications appear to be made from the host IP address and containers’ listening ports are explicitly bound to the host. PV Calls are a perfect match for it: outgoing communications are made from the host IP address directly and listening ports are automatically bound to it. No additional configurations are required.

Another benefit is ease of monitoring. One of the key aspects of hardening Linux containers is keeping applications under constant observation with logging and monitoring. We should not ignore it even though Xen provides a safer environment by default. PV Calls forward networking calls made by the application to Dom0. In Dom0 we can trivially log them and detect misbehavior. More powerful (and expensive) monitoring techniques like memory introspection offer further opportunities for malware detection.

PV Calls are unobtrusive. No changes to Xen are required as the existing interfaces are enough. Changes to Linux are very limited as the drivers are self-contained. Moreover, PV Calls perform extremely well! Let’s take a look at a couple of iperf graphs (higher is better):

iperf client

iperf server

The first graph shows network bandwidth measured by running an iperf server in Dom0 and an iperf client inside the VM (or container in the case of Docker). PV Calls reach 75 gbit/sec with 4 threads, far better than netfront/netback.

The second graph shows network bandwidth measured by running an iperf server in the guest (or container in the case of Docker) and an iperf client in Dom0. In this scenario PV Calls reach 55 gbit/sec and outperform not just netfront/netback but even Docker.

The benchmarks have been run on an Intel Xeon D-1540 machine, with 8 cores (16 threads) and 32 GB of ram. Xen is 4.7.0-rc3 and Linux is 4.6-rc2. Dom0 and DomU have 4 vcpus each, pinned. DomU has 4 GB of ram.

For more information on PV Calls, read the full protocol specification on xen-devel. You are welcome to join us and participate in the review discussions. Contributions to the project are very appreciated!

Q&A: Xen Project Release Strengthens Security and Pushes New Use Cases

The following Q&A with Lars Kurth, the Xen Project chairperson, was first published on Linux.com.

Xen Project technology supports more than 10 million users and is a staple in some of the largest clouds in production today, including Amazon Web Service, Tencent, and Alibaba’s Aliyun. Recently, the project announced the arrival of Xen Project Hypervisor 4.7. This new release focuses on improving code quality, security hardening and features, and support for the latest hardware. It is also the first release of the project’s fixed-term June – December release cycles. The fixed-term release cycles provide more predictability making it easier for consumers of Xen to plan ahead.

We recently sat down with the Xen Project chairperson, Lars Kurth, to talk about some of the key features of the release and the future of Xen Project technology. Lars will be discussing this topic and more during Xen Project’s Developer Summit in Toronto, CA from August 25-26 — the conference is directly after LinuxCon North America.

Q: What was the focus on this release?

Lars Kurth: There were five areas that we focused on for this release (full details are in our blog). In summary, we focused on security features, migration support, performance and workloads, support for new hardware features, and drivers and devices (Linux, FreeBSD and other).

Security is consistently something that we focus on in all of our releases. There are a lot of people that rely on Xen Project technology and security is our top concern in any release as well as how we organize our process around security disclosures.

Q: What was the biggest feature coming out of this release?

Lars: The biggest feature for us is live patching, which is a technology that enables re-boot free deployment for security patches to minimize disruption and downtime during security upgrades for cloud admins. It essentially eliminates all cloud reboots, making cloud providers and their users much more safe. It also eliminates a lot of headaches for system and DevOps admins of the world.

Q: Xen is often associated with the cloud, but are there additional use cases that you see growing around this technology, if so why?

Lars: We are seeing a lot of growth in terms of contributions, as well as many different use cases emerging, including automotive, aviation, embedded scenarios, security, and also IoT. In addition, we continue to grow within the public cloud sector and traditional server virtualization.

On the security front, for example, a number of vendors such as A1Logic, Bitdefender, Star Lab and Zentific have released or are working on new Xen Project-based security solutions. In addition, the security focused and Xen-based OpenXT project has started to work more closely with the Xen Project community.

Long-time contributors to the Xen Project, such as DornerWorks – a premier provider of electronic engineering services for the aerospace, medical, automotive, and industrial markets – have expanded their scope and are now providing support for the Xen Xilinx Zynq Distribution targeting embedded use-cases. We have also seen an increasing number of POCs and demos of automotive solutions, which include Xen as a virtualization solution.

Growth in these sectors is largely due to the Xen Project’s flexibility, extensibility, customisability and a clear lead when it comes to security-related technologies. Over the last year, we have also seen contributions increase from developers with strong security and embedded backgrounds. In fact, this totaled nearly 17 percent of the overall contributions in this release cycle, up from 9 percent in the previous release.

Q: How did you address these uses cases in this latest release?

Lars: We introduced the ability to remove core Xen Project Hypervisor features at compile via KCONFIG. This creates a more lightweight hypervisor and eliminates extra attack surfaces that are beneficial in security-first environments and microservice architectures. Users will still be able to get the core hypervisor functions, but they won’t receive all the drivers, schedulers, components or features that might not fit their use case.

Essentially it gives people an “a la carte” feature set. They can decide what they need for compliance, safety or performance reasons.

Q: Were there any new contributors for this release that surprised you?

Lars: We had three new companies contributing to the project: Star Lab, Bosch and Netflix. I met engineers from Star Lab for the first time at the 2015 Developer Summit less than a year ago, and helped introduce them to the Project’s culture. In that short period of time, Doug Goldstein from Star Lab has moved into the top five contributors and top 10 code reviewers for the Project.

I was surprised about Netflix’s contributions; I didn’t even know the company used Xen. Netflix improved and secured the VPMU feature, which is incredibly useful for system tuning and performance monitoring. Bosch Car Multimedia GmbH added some new ARM functionality. In addition, we have seen quite a bit of Xen related development in upstream and downstream projects such as Linux, FreeBSD, NetBSD, OpenBSD, QEMU and Libvirt.

Q: What’s next for Xen Project? Where do you think the technology is heading in the future and why?

Lars: In the last three releases, we introduced several major new features such as PVH, COLO, new schedulers, VMI, Live Patching, Graphics Virtualization, etc. and significant re-work of existing features such as Migration and the Xen Security Modules (XSM). Looking at trends within the community, I expect that stepwise evolution of large new features to continue.

Some new capabilities, such as restartable Dom0’s, and additional techniques to provide more isolation and security, are also likely to appear. In addition, it looks likely that we will see some GPU virtualization capabilities for GPUs that target the ARM ecosystem, although it is not yet clear whether these will be available as open source. I also expect that both Intel and ARM hardware features will be closely tracked.

Some areas, such as new schedulers, XSM, PVH and Live Patching, will see significant efforts to harden and improve existing functionality. The goal is to ensure their swift adoption in commercial products and Linux and BSD distributions. Some features, which are not enabled by default are likely to become part of the Xen Project Hypervisor’s default configuration.

Will Docker Replace Virtual Machines?

Docker is certainly the most influential open source project of the moment. Why is Docker so successful? Is it going to replace Virtual Machines? Will there be a big switch? If so, when?

Let’s look at the past to understand the present and predict the future. Before virtual machines, system administrators used to provision physical boxes to their users. The process was cumbersome, not completely automated, and it took hours if not days. When something went wrong, they had to run to the server room to replace the physical box.

With the advent of virtual machines, DevOps could install any hypervisor on all their boxes, then they could simply provision new virtual machines upon request from their users. Provisioning a VM took minutes instead of hours and could be automated. The underlying hardware made less of a difference and was mostly commoditized. If one needed more resources, it would just create a new VM. If a physical machine broke, the admin just migrated or resumed her VMs onto a different host.

Finer-grained deployment models became viable and convenient. Users were not forced to run all their applications on the same box anymore, to exploit the underlying hardware capabilities to the fullest. One could run a VM with the database, another with middleware and a third with the webserver without worrying about hardware utilization. The people buying the hardware and the people architecting the software stack could work independently in the same company, without interference. The new interface between the two teams had become the virtual machine. Solution architects could cheaply deploy each application on a different VM, reducing their maintenance costs significantly. Software engineers loved it. This might have been the biggest innovation introduced by hypervisors.

A few years passed and everybody in the business got accustomed to working with virtual machines. Startups don’t even buy server hardware anymore, they just shop on Amazon AWS. One virtual machine per application is the standard way to deploy software stacks.

Application deployment hasn’t changed much since the ’90s though. Up until then, it still involved installing a Linux distro, mostly built for physical hardware, installing the required deb or rpm packages, and finally installing and configuring the application that one actually wanted to run.

In 2013 Docker came out with a simple, yet effective tool to create, distribute and deploy applications wrapped in a nice format to run in independent Linux containers. It comes with a registry that is like an app store for these applications, which I’ll call “cloud apps” for clarity. Deploying the Nginx webserver had just become one “docker pull nginx” away. This is much quicker and simpler than installing the latest Ubuntu LTS. Docker cloud apps come preconfigured and without any unnecessary packages that are unavoidably installed by Linux distros. In fact the Nginx Docker cloud app is produced and distributed by the Nginx community directly, rather than Canonical or Red Hat.

Docker’s outstanding innovations are the introduction of a standard format for cloud applications, including the registry. Instead of using VMs to run cloud apps, Linux containers are used instead. Containers had been available for years, but they weren’t quite popular outside Google and few other circles. Although they offer very good performance, they have fewer features and weaker isolation compared to virtual machines. As a rising star, Docker made Linux containers suddenly popular, but containers were not the reason behind Docker’s success. It was incidental.

What is the problem with containers? Their live-migration support is still very green and they cannot run non-native workloads (Windows on Linux or Linux on Windows). Furthermore, the primary challenge with containers is security: the surface of attack is far larger compared to virtual machines. In fact, multi-tenant container deployments are strongly discouraged by Docker, CoreOS, and anybody else in the industry. With virtual machines you don’t have to worry about who is going to use it or how it will be used. On the other hand, only containers that belong to the same user should be run on the same host. Amazon and Google offer container hosting, but they both run each container on top of a separate virtual machine for isolation and security. Maybe inefficient but certainly simple and effective.

People are starting to notice this. At the beginning of the year a few high profile projects launched to bring the benefits of virtual machines to Docker, in particular Clear Linux by Intel and Hyper. Both of them use conventional virtual machines to run Docker cloud applications directly (no Linux containers are involved). We did a few tests with Xen: tuning the hypervisor for this use case allowed us to reach the same startup times offered by Linux containers, retaining all the other features. A similar effort by Intel for Xen is being presented at the Xen Developer Summit and Hyper is also presenting their work.

This new direction has the potential to deliver the best of both worlds to our users: the convenience of Docker with the security of virtual machines. Soon Docker might not be fighting virtual machines at all, Docker could be the one deploying them.

A Chinese translation of the article is available here: http://dockone.io/article/598

The Bare-Metal Hypervisor as a Platform for Innovation

In this industry, everyone seems to talk about innovation, but very few platforms exist which foster innovation.  More times than not, “innovation” is simply a buzzword used by some marketing campaign to hawk something about as novel as twenty-year-old accounting software.

Innovation does occur, of course.  But often real innovation leverages what already exists to create something which doesn’t yet exist.  It may borrow from the known, but it produces something previously unknown.  For example, the industry has been going wild over cloud computing in the past few years, but many of the core cloud computing concept are actually old mainframe concepts reimagined in the world of commodity servers.

Making a Place for Innovation to Thrive

A bare-metal hypervisor — like the one produced by the Xen Project — can be an excellent platform for innovation.  We think of hypervisors as old technology, plumbing for newer technologies like cloud — and, indeed, they are.  But the nature of the bare-metal hypervisor makes it an excellent platform for innovation to take place.

Continue reading

Hardening Hypervisors Against VENOM-Style Attacks

This is a guest blog post by Tamas K. Lengyel, a long-time open source enthusiast and Xen contributor. Tamas works as a Senior Security Researcher at Novetta, while finishing his PhD on the topic of malware analysis and virtualization security at the University of Connecticut.

The recent disclosure of the VENOM bug affecting major open-source hypervisors, such as KVM and Xen, has many reevaluating their risks when using cloud infrastructures. That is a very good thing indeed. Complex software systems are prone to bugs and errors.  Virtualization and the cloud have been erroneously considered by many to be a silver bullet against intrusions and malware. The fact is the cloud is anything but a safe place. There are inherent risks in the cloud and it’s very important to put those risks in their proper context.

venom

VENOM is one of many vulnerabilities involving hypervisors (e.g. [Qemu], [ESX], [KVM], [Xen]). And, while Venom is indeed a serious bug and can result in a VM escape, which in turn can compromise all VMs on the host, it doesn’t have to be. In fact, we’ve known about VENOM-style attacks for a long time. Yet, there are easy-to-deploy counter-measures to mitigate the risk of such exploits natively available in both Xen and KVM (see RedHat’s blog post on the same topic).

What is the root cause of VENOM? Emulation

While modern systems come with a plethora of virtualization extensions, many components are still being emulated, usually devices such as your network card, graphics card and your hard drive. While Linux comes with paravirtual (virtualization-aware) drivers to create such devices, emulation is often the only solution to run operating systems that do not have kernel drivers. This has been traditionally the case with Windows (note that Citrix recently open-sourced its Windows PV drivers so this is no longer the case). This emulation layer has been implemented in QEMU, which caused VENOM and a handful of other VM-escape bugs in recent years. However, QEMU is not the only place where emulation is used within Xen. For a variety of interrupts and timers, such as RTC, PIT, HPET, PMTimer, PIC, IOAPIC and LAPIC, there is a baked-in emulator in Xen itself for performance and scalability reasons. As with QEMU, this emulator has been just as exploitable. This is simply because programming emulators properly is really complex and error-prone.

Available Mitigations

We’ve known how complicated emulation is for some time. In 2011, this Black Hat presentation on Virtunoid demonstrated a VM escape attack against KVM through QEMU. The author of Virtunoid even cautioned there would be many bugs of that nature found by researchers. Fast-forward four years and we now have VENOM.

The good news is it’s easy to mitigate all present and future QEMU bugs, which the recent Xen Security Advisory emphasized as well. Stubdomains can nip the whole class of vulnerabilities exposed by QEMU in the bud by moving QEMU into a de-privileged domain of its own. Instead of having QEMU run as root in dom0, a stubdomain has access only to the VM it is providing emulation for. Thus, an escape through QEMU will only land an attacker in a stubdomain, without access to critical resources. Furthermore, QEMU in a stubdomain runs on MiniOS, so an attacker would only have a very limited environment to run code in (as in return-to-libc/ROP-style), having exactly the same level of privilege as in the domain where the attack started. Nothing is to be gained for a lot of work, effectively making the system as secure as it would be if only PV drivers were used.

While it might sound complex, it’s actually quite simple to take advantage of this protection. Simply add the following line to your Xen domain configuration, and you gain immediate protection against VENOM and the whole class of vulnerabilities brought to you by QEMU:

device_model_stubdomain_override = 1

However, as with most security systems, it comes at a cost. Running stubdomains requires a bit of extra memory as compared to the plain QEMU process in dom0. This in turn can limit the number of VMs a cloud provider may be able to sell, so they have little incentive to enable this feature by default on their end. However, this protection works best if all VMs on the server run with stubdomains. Otherwise you may end up protecting your neighbors against an escape attack from your domain, while not being protected from an escape attack from theirs. It’s no surprise that some users would not want to pay for this extra protection, even if it was available. But for private clouds and exclusive servers there is no reason why it shouldn’t be your default setting.