Tag Archives: xen

Xen Project 4.6.6 and 4.7.3 are available

I am pleased to announce the release of 4.6.6 and 4.7.3. Xen Project Maintenance releases are released in line with our Maintenance Release Policy. We recommend that all users of the 4.6 and 4.7 stable series update to the latest point release.

4.6.6
The release is available from its git repository
http://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/stable-4.6
(tag RELEASE-4.6.6) or from the XenProject download page
http://www.xenproject.org/downloads/xen-archives/xen-46-series/xen-466.html

4.7.3
The release is available from its git repository
http://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/stable-4.7
(tag RELEASE-4.7.3) or from the XenProject download page
http://www.xenproject.org/downloads/xen-archives/xen-47-series/xen-473.html

These releases contain many bug fixes and improvements. For a complete list of changes, please check the lists of changes on the download pages.

What’s New in the Xen Project Hypervisor 4.9?

I am pleased to announce the release of the Xen Project Hypervisor 4.9. As always, we focused on improving code quality, security hardening as well as enabling new features. Our approach to security is also the reason why we delayed this release by 3 weeks: security issues that were discovered during the hardening phase of this release, were batched and handled using our Security Policy, which requires us to develop fixes for security issues in private and allows organisations on our pre-disclosure list to update their systems and software, before any code is made public. Consequently, we had to wait until June 20, before we could apply security fixes, build the final release candidate and test the final release candidate.

The Xen Project Hypervisor 4.9 release focuses on advanced features for embedded, automotive and native-cloud-computing use cases, enhanced boot configurations for more portability across different hardware platforms, the addition of new x86 instructions to hasten machine learning computing, and improvements to existing functionality related to the ARM® architecture, device model operation hypercall, and more.

We are also pleased to announce that Julien Grall, Senior Software Engineer at ARM, will stay release manager for Xen Project Hypervisor 4.10 release.

We grouped updates to the Xen Project Hypervisor using the following categories

  • New Features
  • Improvements to Existing Functionality
  • Multi-Release Long-Term Development

New Features

Boot Xen on EFI platforms using GRUB2 (x86): From Xen Project 4.9 and GRUB2 2.02 onwards, the Xen Project Hypervisor can be booted using the multiboot2 protocol on legacy BIOS and EFI x86 platforms. Partial support for the multiboot2 protocol was also introduced into network boot firmware (iPXE). This makes the Xen Project boot process much more flexible. Boot configurations can be changed directly from within a bootloader (without having to use text editors) and boot configurations are more portable across different platforms.

Near native latency for embedded and automotive environments: The “null” scheduler enables use-cases where every virtual CPU can be assigned to a physical CPU (commonly needed for embedded and automotive environments) removing almost all of the scheduler overheads in such environments. Usage of the “null” scheduler also guarantees significantly lower latency and more predictable performance. The new vwfi parameter for ARM (virtual Wait For Interrupt) allows fine-grained control of how the Xen Project Hypervisor handles WFI instructions. Setting vwfi to “native” reduces interrupt latency by approximately 60%. Benchmarks on Xilinx Zynq Ultrascale+ MPSoC’s have shown a maximum interrupt latency of less than 2 microseconds, which is extremely close to hardware limits, and should be small enough for the vast majority of embedded use cases.

Xen 4.9 includes new standard ABIs for sharing devices between virtual machines (including reference implementations) for a number of embedded, automotive and cloud native computing use-cases.

For embedded/automotive, a virtual sound ABI was added implementing audio playback and capture as well as volume control and the possibility to mute/unmute audio sources. In addition a new virtual display ABI for complex display devices exposing multiple framebuffers and displays has been added. Multi-touch support has been added to the virtual keyboard/mouse protocol enabling touch screens.

Xen 4.9 also introduces a Xen transport for 9pfs, which is a remote filesystem protocol originally written for Plan 9. During the Xen 4.9 release cycle, a Xen 9pfs frontend was upstreamed in the Linux kernel and a backend in QEMU. It is now possible to share a filesystem (not necessarily a block device) from a virtual machine to another, which is a requirement for adding Xen support to many container engines, such as CoreOS rkt.

The PV Calls ABI has been introduced to allow forwarding POSIX requests across guests: a POSIX function call originating from an app in a DomU can be forwarded and implemented in Dom0. For example, guest networking socket calls can be executed to Dom0, enabling a new networking model which is a natural fit for cloud-native apps.

Improvements to Existing Functionality

Xenstored optimisations: Xenstore daemons allow Dom0 and guests access to system configuration information. C-xenstored scalability limits have been increased to allow large hosts (about >1000 domains) to run efficiently. Transaction handling has been improved for better performance, smaller memory footprint and fewer transaction conflicts. Dynamic debugging capabilities have been added.

DMOP (Device Model Operation Hypercall): In Xen 4.9 the interface between Xen and QEMU was completely re-worked and consolidated. There is now only a single hypercall in Xen (the DMOP hypercall), which is carefully designed to allow the privcmd driver to audit any QEMU memory ranges and parameters that are passed to Xen via DMOP. The Linux privcmd driver enables DMOP auditing, which significantly limits the capability of a compromised QEMU to attack the hypervisor.

Alternative runtime patching and GICv3 support for ARM32: Alternative runtime patching which enables the hypervisor to apply workarounds for erratas affecting the processor and to apply optimizations specific to a CPU and GICv3 support was extended for 32-bit ARM platforms, bringing this functionality to embedded use-cases.

Intel and x86 Feature Support: The latest version of the Xen Project hypervisor adds the support of Neural Network Instructions AVX512_4VNNIW and Multiply Accumulation Single precision AVX512_4FMAPS as subfamilies of AVX512 instruction sets. With these instructions enabled in Xen for both HVM and PV guests, programs in guest OSes can take full advantage of these important instructions to speed up machine learning computing. This Xen release also further enhances VT-d Posted Interrupt (PI) optimization, Machine Check Exception(MCE) handling, and more.

System Error Detection (ARM): Xen on ARM made a step forward in reliability and serviceability with the introduction of System Error detection and reporting, a key feature for customers with highly available systems.

GCOV support: We removed the old GCOV implementation and replaced it with an updated version that supports more formats and exposes a more generic interface.

Re-work and hardening of x86 emulation code for security: Hardware-assisted virtualisation provides hypervisors with the ability to execute most privileged instructions natively and securely. However, for some boundary cases, it is still necessary to emulate x86 instructions in software. In Xen 4.9, the project completely re-worked the x86 emulation code, added support for new instructions, audited the code against security vulnerabilities and created AFL based test fuzzing tests that are regularly run against the emulator.

Updated support for Microsoft’s Hyper-V Hypervisor Top-Level Functional Specification (also known as Viridian Enlightenments): Xen implements a subset of version 5.0 of the Hyper-V Hypervisor TLFS, which enables Xen to run Windows guests at similar performance as it would run on Hyper-V. In addition, this work lays the groundwork to enable us to run Hyper-V within Xen in the future using nested virtualization.

Multi-Release Long-Term Development

This section contains large feature developments that cover several release cycles. It is intended to provide a progress update for larger features.

Transition from PVHv1 to PVHv2: Xen Project 4.8 laid the groundwork for re-architecting and simplifying PVH, focussing on the DomU guest ABI, which enabled Guest operating system developers to start porting their OSes to this mode. Support for FreeBSD is in progress, while support for Linux is committed. Xen 4.9 added Dom0 builder support and support for multiple virtual Intel I/O Advanced Programmable Interrupt Controllers (vIO APIC). PVHv2 for interrupt routing and PCI emulation is currently being peer reviewed and can be expected early in the Xen 4.10 release cycle. This lays the groundwork for a PVHv2 Dom0. For PVHv2 DomU support, PCI Passthrough and a major re-work of the xl/libxl and libvirt user interfaces for PVH have been started. Support for PVHv1 has been removed from the Xen Codebase.

Reworking the Xen-QEMU integration to protect against QEMU security vulnerabilities: In Xen Project 4.8, we embarked on an effort to re-work Xen-QEMU integration which amounts to sandboxing QEMU within Dom0. Significant progress was made in Xen 4.9 towards this goal, with the implementation of DMOP. Other changes such de-privileging QEMU in Dom0 and changes to the Linux privcmd driver have been mostly completed in Xen 4.9. Changes that are currently designed, but net yet implemented, are necessary changes to libxl and QEMU’s usage of XenStore.

Summary

Despite the shorter release cycle, the community developed several major features, and found and fixed many more bugs. Compared to Xen 4.8, which was our first fixed-term release, we have seen increased Development Velocity (in Xen 4.8 developers contributed 1245 changes – in Xen 4.9 developers contributed 1549 changes – a growth of 20%), increased Code Review activity, and more contributors (both individual and organisations contributing). For Xen 4.8 a total of 68 developers from 25 employers contributed, for Xen 4.9 a total of 86 developers from 30 employers contributed.

As in Xen 4.8, we took a security-first approach for Xen 4.9 and spent a lot of energy to improve code quality and harden security. This inevitably slowed down the acceptance of new features somewhat and also delayed the release. However, we believe that we reached a meaningful balance between mature security practices and innovation.

On behalf of the Xen Project Hypervisor team, I would like to thank everyone for their contributions (either in the form of patches, code reviews, bug reports or packaging efforts) to the Xen Project. Please check our acknowledgement page, which recognises all those who helped make this release happen.

The source can be located in the https://xenbits.xenproject.org/gitweb/?p=xen.git;a=shortlog;h=refs/tags/RELEASE-4.9.0 tree (tag RELEASE-4.9.0) or can be downloaded as tarball from our website. For detailed download and build instructions check out the guide on building Xen 4.9

More information can be found at

PV Calls: a new paravirtualized protocol for POSIX syscalls

Let’s take a step back and look at the current state of virtualization in the software industry. X86 hypervisors were built to run a few different operating systems on the same machine. Nowadays they are mostly used to execute several instances of the same OS (Linux), each running a single server application in isolation. Containers are a better fit for this use case, but they expose a very large attack surface. It is possible to reduce the attack surface, however it is a very difficult task, one that requires minute knowledge of the app running inside. At any scale it becomes a formidable challenge. The 15-year-old hypervisor technologies, principally designed for RHEL 5 and Windows XP, are more a workaround than a solution for this use case. We need to bring them to the present and take them into the future by modernizing their design.

The typical workload we need to support is a Linux server application which is packaged to be self contained, complying to the OCI Image Format or Docker Image Specification. The app comes with all required userspace dependencies, including its own libc. It makes syscalls to the Linux kernel to access resources and functionalities. This is the only interface we must support.

Many of these syscalls closely correspond to function calls which are part of the POSIX family of standards. They have well known parameters and return values. POSIX stands for “Portable Operating System Interface”: it defines an API available on all major Unixes today, including Linux. POSIX is large to begin with and Linux adds its own set of non-standard calls on top of it. As a result a Linux system has a very high number of exposed calls and, inescapably, also a high number of vulnerabilities. It is wise to restrict syscalls by default. Linux containers struggle with it, but hypervisors are very accomplished in this respect. After all hypervisors don’t need to have full POSIX compatibility. By paravirtualizing hardware interfaces, Xen provides powerful functionalities with a small attack surface. But PV devices are the wrong abstraction layer for Docker apps. They cause duplication of functionalities between the guest and the host. For example, the network stack is traversed twice, first in DomU then in Dom0. This is unnecessary. It is better to raise hypervisor abstractions by paravirtualizing a small set of syscalls directly.

PV Calls

It is far easier and more efficient to write paravirtualized drivers for syscalls than to emulate hardware because syscalls are at a higher level and made for software. I wrote a protocol specification called PV Calls to forward POSIX calls from DomU to Dom0. I also wrote a couple of prototype Linux drivers for it that work at the syscall level. The initial set of calls covers socket, connect, accept, listen, recvmsg, sendmsg and poll. The frontend driver forwards syscalls requests over a ring. The backend implements the syscalls, then returns success or failure to the caller. The protocol creates a new ring for each active socket. The ring size is configurable on a per socket basis. Receiving data is copied to the ring by the backend, while sending data is copied to the ring by the frontend. An event channel per ring is used to notify the other end of any activity. This tiny set of PV Calls is enough to provide networking capabilities to guests.

We are still running virtual machines, but mainly to restrict the vast majority of applications syscalls to a safe and isolated environment. The guest operating system kernel, which is provided by the infrastructure (it doesn’t come with the app), implements syscalls for the benefit of the server application. Xen gives us the means to exploit hardware virtualization extensions to create strong security boundaries around the application. Xen PV VMs enable this approach to work even when virtualization extensions are not available, such as on top of Amazon EC2 or Google Compute Engine instances.

This solution is as secure as Xen VMs but efficiently tailored for containers workloads. Early measurements show excellent performance. It also provides a couple of less obvious advantages. In Docker’s default networking model, containers’ communications appear to be made from the host IP address and containers’ listening ports are explicitly bound to the host. PV Calls are a perfect match for it: outgoing communications are made from the host IP address directly and listening ports are automatically bound to it. No additional configurations are required.

Another benefit is ease of monitoring. One of the key aspects of hardening Linux containers is keeping applications under constant observation with logging and monitoring. We should not ignore it even though Xen provides a safer environment by default. PV Calls forward networking calls made by the application to Dom0. In Dom0 we can trivially log them and detect misbehavior. More powerful (and expensive) monitoring techniques like memory introspection offer further opportunities for malware detection.

PV Calls are unobtrusive. No changes to Xen are required as the existing interfaces are enough. Changes to Linux are very limited as the drivers are self-contained. Moreover, PV Calls perform extremely well! Let’s take a look at a couple of iperf graphs (higher is better):

iperf client

iperf server

The first graph shows network bandwidth measured by running an iperf server in Dom0 and an iperf client inside the VM (or container in the case of Docker). PV Calls reach 75 gbit/sec with 4 threads, far better than netfront/netback.

The second graph shows network bandwidth measured by running an iperf server in the guest (or container in the case of Docker) and an iperf client in Dom0. In this scenario PV Calls reach 55 gbit/sec and outperform not just netfront/netback but even Docker.

The benchmarks have been run on an Intel Xeon D-1540 machine, with 8 cores (16 threads) and 32 GB of ram. Xen is 4.7.0-rc3 and Linux is 4.6-rc2. Dom0 and DomU have 4 vcpus each, pinned. DomU has 4 GB of ram.

For more information on PV Calls, read the full protocol specification on xen-devel. You are welcome to join us and participate in the review discussions. Contributions to the project are very appreciated!

Xen Project Hackathon 16 : Event Report

We just wrapped another successful Xen Project Hackathon, which is an annual event, hosted by Xen Project member companies, typically at their corporate offices. This year’s event was hosted by ARM at their Cambridge HQ. 42 delegates descended on Cambridge from Aporeto, ARM, Assured Information Security, Automotive Electrical Systems, BAE Systems, Bromium, Citrix, GlobalLogic, OnApp, Onets, Oracle, StarLab, SUSE and Vates to attend. A big thank you (!) to ARM and in particular to Thomas Molgaard for organising the event and the social activities afterwards.

Here are a few images that helped capture the event:

Taking a breather and photo opp outside of ARM headquarters in Cambridge

Taking a breather and photo opp outside of ARM headquarters in Cambridge

Working on solving the mysteries of the world.

Working on solving the mysteries of the world

Continuing to work hard on solving the mysteries of the world

Continuing to work hard on solving the mysteries of the world

Xen Project Hackathons have evolved in format into a series of structured problem solving sessions that scale up to 50 people. We combine this with a more traditional hackathon approach where programmers (and others involved in software development) collaborate intensively on software projects.

This year’s event was particularly productive because all our core developers and the project’s leadership were present. We focused on a lot of topics, but two of our main themes this year evolved around security and community development. We’ll cover these topics in more detail and how they fit within our next release 4.7 and development going forward, but below is a little taste of some of the other themes of this year’s Hackathon sessions:

  • Security improvements: A trimmed down QEMU to reduce attack surface, de-privileging QEMU and the x86 emulator to reduce the impact of security vulnerabilities in those components, XSplice, KConfig support which allows to remove parts of Xen at compile time, run-time disablement of Xen features to reduce the attack surface, vulnerabilities, disaggregation and enabling XSM (Xen’s equivalent of the Linux Security Modules which are also known as SELinux) by default.
  • Security features: We had two sessions on the future of XSplice (first version to be released in Xen 4.7), which allows users of Xen to apply security fixes on a running Xen instance (aka no need to reboot).
  • Robustness: A session on restartable Dom0 and driver domains, which again will significantly reduce the overhead of applying security patches.
  • Community and code review: A couple of sessions on optimising our working practices: most notably some clarifications to the maintainer role and how we can make code reviews more efficient.
  • Virtualization Modes: The next stage of PVH, which combines the best of HVM and PV. We also had discussions around some functionality that is currently developed in Linux on which PVH has dependencies.
  • Making Development more Scalable: A number of sessions to improve the toolstack and libxl. We covered topics such as making storage support pluggable via a plug-in architecture, making it easier to develop new PV drivers to support automotive and embedded vendors, and improvements to our build system, testing, stub domains and xenstored.
  • ARM support: There were a number of planning sessions for Xen ARM support. We covered the future roadmap, how to implement PCI passthrough, and how we can improve testing for the increasing range of ARM HW with support for virtualization, also applicable outside the server space.

There were many more sessions covering performance, scalability and other topics. The session’s host(s) post meeting notes on xen-devel@ (search for Hackathon in the subject line), if you want to explore any topic in more detail. To make it easier for people who do not follow our development lists, we also posted links to Hackathon related xen-devel@ discussions on our wiki.

Besides providing an opportunity to meet face-to-face, build bridges and solve problems, we always make sure that we have social events. After all Hackathons should be fun and bring people together. This year we had a dinner in Cambridge and of course the obligatory punting trip, which is part of every Cambridge trip.

Embarking on the punting journey

Embarking on the punting journey

Continued exploration of discovering the mysteries of the universe, while on a boat

Continued exploration of discovering the mysteries of the universe, while on a boat

Again, a big thanks to ARM for hosting the event! Also, a reminder that we’ll be hosting our Xen Project Developer Summit next August in Toronto, Canada. This event will happen directly after LinuxCon North America and is a great opportunity to learn more about Xen Project development and what’s happening within the Linux Foundation ecosystem at large. CFPs are still open until May 6th!

Xen Project 4.5.2 Maintenance Release Available

I am pleased to announce the release of Xen 4.5.2. Xen Project Maintenance releases are released roughly every 4 months, in line with our Maintenance Release Policy. We recommend that all users of the 4.5 stable series update to this point release.

Xen 4.5.2 is available immediately from its git repository:

    xenbits.xenproject.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/stable-4.5
    (tag RELEASE-4.5.2)

or from the Xen Project download page at www.xenproject.org/downloads/xen-archives/xen-45-series/xen-452.html.

This release contains many bug fixes and improvements. For a complete list of changes in this release, please check lists of changes on the download page.

Will Docker Replace Virtual Machines?

Docker is certainly the most influential open source project of the moment. Why is Docker so successful? Is it going to replace Virtual Machines? Will there be a big switch? If so, when?

Let’s look at the past to understand the present and predict the future. Before virtual machines, system administrators used to provision physical boxes to their users. The process was cumbersome, not completely automated, and it took hours if not days. When something went wrong, they had to run to the server room to replace the physical box.

With the advent of virtual machines, DevOps could install any hypervisor on all their boxes, then they could simply provision new virtual machines upon request from their users. Provisioning a VM took minutes instead of hours and could be automated. The underlying hardware made less of a difference and was mostly commoditized. If one needed more resources, it would just create a new VM. If a physical machine broke, the admin just migrated or resumed her VMs onto a different host.

Finer-grained deployment models became viable and convenient. Users were not forced to run all their applications on the same box anymore, to exploit the underlying hardware capabilities to the fullest. One could run a VM with the database, another with middleware and a third with the webserver without worrying about hardware utilization. The people buying the hardware and the people architecting the software stack could work independently in the same company, without interference. The new interface between the two teams had become the virtual machine. Solution architects could cheaply deploy each application on a different VM, reducing their maintenance costs significantly. Software engineers loved it. This might have been the biggest innovation introduced by hypervisors.

A few years passed and everybody in the business got accustomed to working with virtual machines. Startups don’t even buy server hardware anymore, they just shop on Amazon AWS. One virtual machine per application is the standard way to deploy software stacks.

Application deployment hasn’t changed much since the ’90s though. Up until then, it still involved installing a Linux distro, mostly built for physical hardware, installing the required deb or rpm packages, and finally installing and configuring the application that one actually wanted to run.

In 2013 Docker came out with a simple, yet effective tool to create, distribute and deploy applications wrapped in a nice format to run in independent Linux containers. It comes with a registry that is like an app store for these applications, which I’ll call “cloud apps” for clarity. Deploying the Nginx webserver had just become one “docker pull nginx” away. This is much quicker and simpler than installing the latest Ubuntu LTS. Docker cloud apps come preconfigured and without any unnecessary packages that are unavoidably installed by Linux distros. In fact the Nginx Docker cloud app is produced and distributed by the Nginx community directly, rather than Canonical or Red Hat.

Docker’s outstanding innovations are the introduction of a standard format for cloud applications, including the registry. Instead of using VMs to run cloud apps, Linux containers are used instead. Containers had been available for years, but they weren’t quite popular outside Google and few other circles. Although they offer very good performance, they have fewer features and weaker isolation compared to virtual machines. As a rising star, Docker made Linux containers suddenly popular, but containers were not the reason behind Docker’s success. It was incidental.

What is the problem with containers? Their live-migration support is still very green and they cannot run non-native workloads (Windows on Linux or Linux on Windows). Furthermore, the primary challenge with containers is security: the surface of attack is far larger compared to virtual machines. In fact, multi-tenant container deployments are strongly discouraged by Docker, CoreOS, and anybody else in the industry. With virtual machines you don’t have to worry about who is going to use it or how it will be used. On the other hand, only containers that belong to the same user should be run on the same host. Amazon and Google offer container hosting, but they both run each container on top of a separate virtual machine for isolation and security. Maybe inefficient but certainly simple and effective.

People are starting to notice this. At the beginning of the year a few high profile projects launched to bring the benefits of virtual machines to Docker, in particular Clear Linux by Intel and Hyper. Both of them use conventional virtual machines to run Docker cloud applications directly (no Linux containers are involved). We did a few tests with Xen: tuning the hypervisor for this use case allowed us to reach the same startup times offered by Linux containers, retaining all the other features. A similar effort by Intel for Xen is being presented at the Xen Developer Summit and Hyper is also presenting their work.

This new direction has the potential to deliver the best of both worlds to our users: the convenience of Docker with the security of virtual machines. Soon Docker might not be fighting virtual machines at all, Docker could be the one deploying them.

A Chinese translation of the article is available here: http://dockone.io/article/598