Author Archives: Roger Pau Monne

Improved Xen support in FreeBSD

FreeBSD Logo
As most FreeBSD users already know, FreeBSD 10 has just been released, and we expect this to be a very good release regarding Xen support. FreeBSD with Xen support includes many improvements, including several performance and stability enhancements that we expect will greatly please and interest users. With many bug fixes already completed, the following description only focuses on new features.

New vector callback

Previous releases of FreeBSD used an IRQ interrupt as the callback mechanism for Xen event channels. While it’s easier to setup, using a IRQ interrupt doesn’t allow to inject events to specific CPUs, basically limiting the use of event channels in disk and network drivers. Also, all interrupts were delivered to a single CPU (CPU#0), not allowing proper interrupt balancing between CPUs.

With the introduction of the vector callback, events can now be delivered to any CPU, allowing FreeBSD to have specific per-CPU interrupts for PV timers and PV IPIs, and balancing the others across the several CPU usually available on a domain.

PV timers

Thanks to the introduction of the vector callback, now we can make use of the Xen PV timer, which is implemented as a per-CPU singleshot timer. This alone doesn’t seem like a great benefit, but it allows FreeBSD to avoid making use of the emulated timers, greatly reducing the emulation overhead and the cost of unnecessary VMEXITs.


As with PV timers, the introduction of the vector callback allows FreeBSD to get rid of the bare metal IPI implementation, and instead route IPIs through event channels. Again, this allows us to get rid of the emulation overhead and unnecessary VMEXITS, providing better performance.

PV disk devices

FLUSH/BARRIER support has been recently added, together with a couple of fixes that allow FreeBSD to run with a CDROM driver under XenServer (which was quite of a pain for XenServer users).

Support for migration

With these new features, migration doesn’t break since it has been reworked to handle the fact that timers and IPIs are also paravirtualized now.

Merge of the XENHVM config into GENERIC

One of the most interesting improvements from a user/admin point of view (and something similar to what the pvops Linux kernel is already doing), the GENERIC kernel on i386 and amd64 now includes full Xen PVHVM support, so there’s no need to recompile a Xen-specific kernel. When run as a Xen guest, the kernel will detect the available Xen features and automatically make use of them in order to obtain the best possible performance.

This work has been done in conjunction between Spectra Logic and Citrix.

Indirect descriptors for Xen PV disks

Some time ago Konrad Rzeszutek Wilk (the Xen Linux maintainer) came up with a list of possible improvements to the Xen PV block protocol, which is used by Xen guests to reduce the overhead of emulated disks.

This document is quite long, and the list of possible improvements is also not small, so we decided to go implementing them step by step. One of the first items that seemed like a good candidate for performance improvement was what is called “indirect descriptors”. This idea is borrowed from the VirtIO protocol, and to put it clear consists on fitting more data in a request. I am going to expand how is this done later in the blog post, but first I would like to speak a little bit about the Xen PV disk protocol in order to understand it better.

Xen PV disk protocol

The Xen PV disk protocol is very simple, it only requires a shared memory page between the guest and the driver domain in order to queue requests and read responses. This shared memory page has a size of 4KB, and is mapped by both the guest and the driver domain. The structure used by the fronted and the backend to exchange data is also very simple, and is usually known as a ring, which is a circular queue and some counters pointing to the last produced requests and responses.

Continue reading

Impressions of the CentOS Dojo Antwerp 2013

Yesterday was the CentOS Dojo Antwerp 2013, where I delivered a talk about tuning Xen for better performance.

The event was very interesting, lots of talks specially oriented at system administrators, so the team didn’t want to miss this great opportunity to speak about Xen, especially considering that, not long ago, the first packages of Xen for CentOS 6 were announced at FOSDEM. More info about the event can be found at the CentOS Dojo wiki page.

Alpine Linux and Xen

I’ve started working on Alpine Linux and Xen integration some time ago, when I was working as a research assistant at UPC, my college. We had just bought some blades and we needed to deploy Xen on them easily. We realized this blades contained a SD and USB slot, which could be used as a boot device. As it happened, just at this time there was a thread on xen-users about booting Dom0 from a USB stick. This was quite interesting to us because we needed something that we could replicate easily amongst a set of identical servers, apart from the fact that being able to run your Dom0 from RAM has several advantages, as described on the thread.

This is when I started working on getting a Xen Dom0 on a USB/SD card using Alpine Linux. Most of the initial changes focused on getting the toolstack to build and work using uclibc, the libc used by Alpine Linux (which indirectly lead to the adoption of autoconf in the Xen project, but that’s another story…). After some patching on the Xen side, my work focused on modifying Alpine Linux configuration scripts (alpine-conf) and the image builder (alpine-iso). This added a new target to the Alpine build system, a Xen Dom0 LiveCD/USB. The first official release of this new “flavour” debuted in Alpine Linux 2.4, and contained Xen 4.1. Since then the Xen package has been updated to 4.2, and the configuration scripts have seen several improvements that make running a Dom0 from RAM even easier.

Since I’m no longer working for my college, so I’m also no longer maintaining any servers, I’ve decided to ask some Alpine Linux users to give their opinions on why is Alpine a great choice for virtualization. Here are some of the most notable benefits of Alpine Linux as compared to more traditional distributions:

From Der Tiger:

Even the base system installation of any major Linux distribution (e.g. Fedora, Ubuntu) creates much more overhead and requires higher performance hardware, than a smaller OS like Alpine Linux or Voyage Linux, without any considerable benefit. The implementation and distribution of bug fixes fo any non-kernel related problem takes forever in most popular Linux and FreeBSD distributions, while Alpine Linux has a very active XEN users and developers group supplying both, maintenance to XEN and some degree of support through the Alpine Wiki and this mailing list.

From Florian Heigl:

  1. Size: It has a lean codebase, a very vanilla and modern kernel, which makes it realistic to apply changes without risk of introducing new issues. Examples that are relevant for me were: OpenVswitch. Another example is Flashcache (for example Flashcache is still in Catchup on resilience features like ssd loss handling. There might be a nice and tiny patch that does all I want. It is thus really important to be able to add some patches to flashcache with smaller means than “yum -y upgrade”. On a distro that is mostly outdated plus important patches plus backports, this can be much, much harder, resulting in more frustration on my side, not to mention the difference in “OPEX” for doing those changes. Another example would be taming the monster named ixgbe + vnics. This or NPIV are highly relevant functions for virtualization if you want abstraction “done right”. They’re practically impossible to use on major distros without replacing drivers.
  2. Unique features: I found out about Alpine not as a Xen distro – I found it because I was looking for something that goes beyond the commonly available networking options, be it very old (bridging) or newer (openvswitch+gre as the current practice). I want to be able to run a really virtualized infrastructure and after months found out there is only one distro that offers DMVPN. Next I found out that distro is made with a very network-engineeric mindset, being able to run from RAM and actually designed to make that easy. I’ve in the past made CentOS and later OracleVM (google: xen black magic) run “almost stateless” from a readonly root with a lot of ram-backed tmpfs, and it was as far as you can take those RHEL-based distros. You’ll also be just as far from any idea of vendor support with such a setup. Just imagine my surprise when I found Alpine comes with this designed in and commonly run in a ram-backed mode.
Why is that so important?
Because, let’s be honest, dom0 is not important. Important are those VMs I’m running for other people and that pesky dom0 should be incredibly indestructible and immortal so it can always serve them. No matter if all my disks failed, the networking died or the sun is sending nasty rays upsetting my ECC memory (Yes, Alpine helps me with that too because it has a current kernel and so it can read the MCE data passed from Xen).
This is just a small abstract of all the points given, so I would recommend everyone to give a try to Alpine Linux, each release starting from 2.4 comes with a Xen Dom0 LiveCD/USB that can be found at the downloads page. After all you don’t even need to install it to your hard drive :)

Improving block protocol scalability with persistent grants

The blkback/blkfront drivers developed by the original Xen team was lightweight and fast zero-copy protocol that has served well for many years. However, as the number of physical cores and the number of guests have increased on systems, we have begun to identify some bottlenecks to scalability. This prompted a recent improvement to the protocol, called “persistent grants”. This blog post will describe the original protocol and what the source of the bottlenecks are, and then will describe persistent grants, along with experiments demonstrating the scalability improvements.

How PV Driver protocol works

Xen PV drivers for block devices use a special protocol to transfer data from the guest to the devices that act as storage. This protocol is based on a shared ring that allows the exchange of data between the guest and the host. This ring is shared between the guest and the driver domain, which is usually Dom0. The driver domain has access to the physical hardware (a disk) or to the virtual image of the guest, and the guest issues requests to perform operations on this storage.

The capacity of the shared ring is limited, as also are the maximum number of requests in flight at a certain point. To avoid having to allocate a very large amounts of shared memory at start, Xen shared ring allocates at start only the minimum amount of memory to be able to queue the requests and responses (that is a single memory page), but not the data itself. The data is allocated using grants, which are memory areas shared on request, and references to those grants are passed along with the request, so the driver domain can map those areas of memory and perform the requested operations.

Continue reading