Author Archives: Konrad Rzeszutek Wilk

About Konrad Rzeszutek Wilk

Konrad is an Senior Development Director at Oracle who has been working since 2009 on Xen. First as an maintainer of the Xen tree in the upstream kernel, and now as release manager for Xen 4.5.

Less is More in the New Xen Project 4.5 Release

If we used code-names, the Xen 4.5 release should be called Panda on Diet! We have 78K new code with 141K deleted. In effect this release has -63KLOC code than the previous one.

The net effect of a skinnier Xen Project Hypervisor code base is increased usability, simplicity and innovation. This is all by design and one of many steps we’ll continue to take to fine-tune our development and release cycle.

For example, we shed the Python toolstack – including xend which we deprecated in 4.3. This comprised the majority of the code deleted in today’s release, which is a big boon for developers who now have less code to maintain and can spend more time on new features.

And 4.5 is more feature-rich than any release in Xen Project’s history.


Today we are announcing specific patches in Xen Project Hypervisor 4.5 that span from architecture (x86 and ARM), platforms (different ARM, AMD or Intel boards), to generic code. The release also creates new opportunity to incorporate Xen virtualization into software stacks in markets like embedded computing, automotive, drones, avionics and more.

Virtualization and open source are more relevant than ever in today’s evolving, more software-centric data center too. New developments with hyper scale-out computing, Internet of Things, NFV/SDN, and next-generation ARM-based products are driving increased demand for better resource sharing and utilization with enough flexibility to efficiently grow well into the future. What isn’t likely to change anytime soon is the diversity of OSes, multi-tenant architectures, security concerns and storage and network challenges that cloud providers and enterprises must contend with to run their applications. Undeniably, abstraction at the VM level is necessary for superior performance and security in these environments.

Despite these impressive and rapid changes, or perhaps because of them, Xen Project developers are motivated to continually stay ahead of the market with performance, speed, agility and security improvements. Our traditional customers also inspire us; organizations such as Alibaba, Amazon Web Services, IBM Softlayer, Rackspace, Oracle and others are some of the most savvy and innovative users around.

To learn more about the release and for ease of reading, I’ve grouped the summary of updates into four major categories:

  • Hypervisor specific

  • Toolstack

  • External users of toolstack

  • Linux, FreeBSD, and other OSes that can utilize the new features.

x86 Hypervisor-Specific Updates

On the x86 side, development has focused on improving performance on various fronts:

  • The HPET has been modified to provide faster and better resolution values.

  • Memory is scrubbed in parallel on bootup, giving a huge time boost for large-scale machines (1TB or more).

  • PVH initial domain support for Intel has been added and now supports running as dom0 and FreeBSD with Linux platforms. PVH is an extension to the classic Xen Project Paravirtualization (PV) that uses the hardware virtualization extensions available on modern x86 processor servers. Requiring no additional support other than the hypervisor, PVH boots as the first guest and takes on the responsibilities of the initial domain known as dom0. This means Xen Project Hypervisor is able to take advantage of contemporary hardware features like virtual machine extensions (VMX) to significantly expedite execution of the initial domain. Instead of asking the hypervisor to handle certain operations, the dom0 can execute operations natively without compromising security. For more background, Virtualization Spectrum is an excellent introduction to PVH.

  • Lower interrupt latency for PCI passthrough on large-scale machines (more than 2 sockets).

  • Multiple IO-REQ services for guests, which is a technique to have many QEMUs assigned for one domain. This allows speed up of guests operation by having multiple backends (QEMUs) deal with different emulations.

We also expanded support for:

  • Soft affinity for vCPUs: Xen has had NUMA- aware scheduling ( since 4.3. In Xen 4.5, we build on that to make it more general and useful on non-NUMA systems. In fact, it is now possible for the sysadmin to define an arbitrary set of physical CPUs on which vCPUs prefer to run on, and Xen will try as hard as possible to follow this indication.

  • Security improvements – guest introspection expansion: VM introspection using Intel EPT / AMD RVI hardware virtualization functionality builds on Xen Project Hypervisor Memory Inspection APIs introduced in 2011. This addresses a number of security issues from outside the guest OS, without relying on functionality that can be rendered unreliable by advanced malware. The approach works by auditing access of sensitive memory areas using HW support in guests with minimal overhead and allows control software running within a dedicated VM to allow or deny attempts to access sensitive memory based on policy and security heuristics. You can find an excellent introduction on the topic of VM introspection here and a video on Youtube (a recording of this presentation) explaining the new functionality in Xen 4.5.

  • Serial support for debug purposes. This covers PCIe cards (Oxford ones) and newer Broadcom ones found on blades.

  • Experimental support for Real-Time Scheduling: a new, multicore-enabled, real-time scheduler, called RTDS is part of Xen 4.5 as an experimental feature. Virtualization will soon become the norm rather than the exception in automotive, avionics, mobile and multimedia, and other fields where predictability and high-end, real-time support are critical. Xen wants to play a big role in this, and this new scheduler will allow for such, which is why we introduced it in 4.5 while still under development. More information here: Youtube video, Linux Foundation presentation and related blog.

Intel Hypervisor-Specific Updates

  • Broadwell Supervisor Mode Access Prevention. This LWN article has an excellent explanation of it – but a short summary is that it restricts the kernel from accessing the user-space pages. This feature in Xen also added alternative assembler support to patch the hypervisor during run-time (so that we won’t be running these operations on older hardware).

  • Haswell Server Cache QoS Monitoring, aka Intel Resource Director Technology, is a “new area of architecture extension that seeks to provide better information and control of applications running on Intel processors. The feature, ” … documented in the Software Developers’ Manual, relates to monitoring application thread LLC usage, to provide a means of directing such usage and provide more information on the amount of memory traffic out of the LLC,” according to xen-devel.

  • SandyBridge (vAPIC) extensions.  Xen 4.3 added support for VT-d Posted Interrupts, and  in Xen 4.5 we added extensions for PVHVM guests to take advantage of VT-d Posted Interrupts. Instead of using vector callback, the guest can utilize the vAPIC to lower its VMEXIT overhead, leading to lower interrupt latency and performance improvements for I/O intensive workloads in PVHMM guests.

AMD Hypervisor-Specific Updates

  • Fixes in the microcode loading.

  • Data Breakpoint Extensions and further MSR masking support for Kabini, Kaveri and newer. This allows “.. to specify cpuid masks to help with cpuid levelling across a pool of hosts,” from the xen-command-line manual.

ARM Hypervisor-Specific Updates

The ARM ecosystem operates differently than the x86 architecture – in which ARM licensees design new chipsets and features and OEMs manufacture platforms based on these specifications. OEMs designing ARM-based platforms determine what they need on the SoC – that is the System On Chip. As such, they can selectively enable or disable certain functionality that they consider important (or unimportant). ARM provides the Intellectual Property (IP) and standards from which OEMs can further specialize and optimize. Therefore the features Xen Project Hypervisor supports on ARM are not for a specific platform – but rather for functionality SoCs provide. New updates include:

  • Support for up to 1TB for guests.

  • The Generic Interrupt Controller (GIC) v3 is supported in Xen 4.5. v3 is very important because it introduces support for Message Signaled Interrupts (MSI), emulation of GICv3 for guests – and most importantly – for more than 8 CPUS. Many of the new features are not used by Xen yet but the driver is on par with v2.

  • Power State Coordination Interface 0.2 (PSCI) is important in embedded environments where power consumption needs to be kept to the absolute minimum. It allows us to power down/up CPUS, suspend them, etc.

  • UEFI booting. On ARM64 servers both U-Boot and UEFI can be used to boot the OS.

  • IOMMU support (SMMUv1). For isolation between guests, ARM platforms can come with an IOMMU chipset based on the SMMU specification.

  • Super Pages (2MB) support in Xen. Using super pages for the guest pseudo-physical to physical translation tables significantly improves overall guest performance.

  • Passthrough – the PCI passthrough features did not make it on time, but doing passthrough of MMIO regions did. In the ARM world, it is quite common to have no PCIe devices and to only access devices using MMIO regions. As such this feature allows us to have driver domains be in charge of network or storage devices.

  • Interrupt latency reduction: By removing maintenance interrupts, we get rid of an expensive trap into Xen for each interrupt EOI. Please see Stefano’s slides.

With these new features, the following motherboards are now supported in Xen Project Hypervisor 4.5:

  • AMD Seattle

  • Broadcom 7445D0 A15

  • Midway (Calxeda)

  • Vexpress (ARM Ltd.)

  • OMAP5, DRA7 (Texas Instrument)

  • Exynos5250 (Exynos 5 Dual), Odroid-Xu, and Exynos 5420 (Exynos Octa) (Samsung SoC for Arndale and various smartphones and tablets)

  • SunXI (AllWinner), aka A20/A21, CubieTruck, CubieBoard

  • Mustang (Applied Micro-X-Gene, the ARMv8 SoC)

  • McDivitt aka HP Moonshot cartridge (Applied Micro X-Gene)

  • The Xen Project also maintains this list of ARM boards that work with Xen Project software.

Toolstack Updates

Xen Project software is now using a C-based toolstack called xl or libxl, replacing the obsolete Python toolstack called xend.  This more modern architecture is easier to easier maintain, and users will not be affected by the move since xm and xl offer feature parity. In fact, the switch greatly simplifies managing Xen instances as other toolstack, such as libvirt are C based and less complex. libvirt and XAPI are now using libxl as well. For more background, check out our new hands-on tutorial “XM to XL: A Short, but Necessary, Journey.”

Additional toolstack changes include:

  • VM Generation ID. This allows Windows 2012 Server and later active directory domain controllers to be migrated.

  • Remus initial support provides high availability by check pointing guests states at high frequency.

  • Libxenlight (libxl) JSON infrastructure support. This allows libxenlight to use JSON to communicate with other toolstacks.

  • Libxenlight to keep track of domain configuration. It now uses the JSON infrastructure to keep track of domain configuration. The is feature parity with Xend.

  • Systemd support. This allows one source base to contain the systemd files, which can be used by various distributions instead of them having to generate them.

  • vNUMA,while still in progress,  is coming along nicely thanks to sponsorship from . Virtual NUMA allows Xen to expose to the guest the NUMA topology (either based on the host or made-up) for the guest.

On the libvirt side, changes include:

  • PCI/SR-IOV passthrough, including hot{un}plug

  • Migration support

  • Improved concurrency through job support in the libxl driver – no more locking entire driver when modifying a domain

  • Improved domxml-{to,from}-native support, e.g. for converting between xl config and libvirt domXML and vise-versa

  • PV console support

  • Improved qdisk support

  • Support for <interface type=’network’> – allows using libvirt-managed networks in the libxl driver

  • Support PARAVIRT and ACPI shutdown flags

  • Support PARAVIRT reboot flag

  • Support for domain lifecycle event configuration, e.g. on_crash, on_reboot, etc

  • A few improvements for ARM

  • Lots of bug fixes

QEMU Updates

Xen Project 4.5 will ship with QEMU v2.0 and SeaBIOS v1.7.5 with the following updates:

  • Bigger PCI hole in QEMU via the mmio_hole parameter in guest config. This allows users to pack more legacy PCI devices for passthrough in an guest.

  • QEMU is now built for ARM providing backend support for framebuffer (VNC).


The 4.5 release also takes advantage of new features in Linux and FreeBSD such as PVH support (which is considered experimental)


With 43 major new features, 4.5 includes the most updates in our project’s history. That’s not even counting 22 new enablers in up-streams (Linux and QEMU). The Project is also taking a more holistic, proactive approach to managing dependencies such as Linux and QEMU, as well as downstream functionality such as libvirt. In 2015, we plan to build on this even further up the stack to include OpenStack and other key projects. For the first time, our Project’s development process is robust, active and mature enough to systematically focus on these strategic growth opportunities. It also reflects enhanced responsiveness to community feedback; for example, we’re improving usability and performing broader testing for specific use cases with new releases.

During this development and release we’ve seen a steady influx of folks helping, contributing, testing and reporting. As the Release Manager, I would like to thank everybody and call out major contributions coming from AMD, Bitdefender, Citrix, Fujitsu, GlobalLogic, Intel, Linaro, Oracle, SuSE and Cavium, as well as several individual and academic institutions.

The sources are located in the git tree or one can download the tarball:

  • xen: with a recent enough git (>= just pull from the proper tag (RELEASE-4.5.0) from the main repo directly:

  • git clone -b RELEASE-4.5.0 git://

  • With an older git version (and/or if that does not work, e.g., complaining with a message like this: Remote branch RELEASE-4.5.0 not found in upstream origin, using HEAD instead), do the following:

  • git clone git:// ; cd xen ; git checkout RELEASE-4.5.0

  • tarball: here it is a 4.5.0 and its signature.

Release Documentation can be found on our wiki.

Linux 3.14 and PVH

The Linux v3.14 will sport a new mode in which the Linux kernel can run thanks to Mukesh Rathor (Oracle).

Called ‘ParaVirtualized Hardware,’ it allows the guest to utilize many hardware features – while at the same time having no emulated devices. It is the next step in PV evolution, and it is pretty fantastic.

Here is a great blog that explains the background and history in detail at:
The Paravirtualization Spectrum, Part 2: From poles to a spectrum.

The short description is that Xen guests can run as HVM or PV. PV is a mode where the kernel lets the hypervisor program page-tables, segments, etc. With EPT/NPT capabilities in current processors, the overhead of doing this in an HVM (Hardware Virtual Machine) container is much lower than the hypervisor doing it for us. In short, we let a PV guest run without doing page-table, segment, syscall, etc updates through the hypervisor – instead it is all done within the guest container.

It is a hybrid PV – hence the ‘PVH’ name – a PV guest within an HVM container.
Continue reading

Linux 3.3!

On March 18th, Linux 3.3 was released and it featured a number of interesting Xen related features.

  • Re-engineered how tools can perform hypercalls – by using a standard interface (/dev/xen/privcmd instead of using /proc/xen/privcmd)
  • Backends (netback, blkback) can now function in HVM mode. This means that a device driver domain can be in charge of a device (say network) and a subset of the network (netback). What is exciting about this it allows for security by isolation – so if one domain is compromised it does not affect the other domains. Both Qubes and NSA Research Center have been focusing on this functionality and it is exciting to see components of this goal taking shape!
  • Continue reading

Xen in Linux 3.2, 3.3 and Beyond

Linux 3.2
Linux 3.2 was released on Jan 4th and compared to earlier kernel releases, this one was very focused on fixing bugs reported by the community.

Thank you!!

Issues that caused lots of bug reports were:

  • The xen-platform-pci module (used for HVM guest to enable PV drivers) was frequently not included in the installer (that is now fixed by making it built in the kernel and fixing the installer builders).
  • ‘xl list’ vs ‘xm list’ discrepancy: this was caused by the guest not having the memory in the “right” sections.
  • Others were related to issues found with Amazon EC2, and bug fixes from Linux distributions (Debian, Canonical, Fedora, Red Hat, Citrix  and Oracle).
  • We also fixed boot issues for Dell machines.

We are all quite grateful for community reporting these issues! For reported issues, it might take some time to find the root cause. We do want to get them all fixed and hope that you will be patient with us.

On the “feature” side we

  • cleaned the code
  • added support for big machines with more than 256 PCI devices
  • added kexec support for PVonHVM (which sadly broke Amazon EC2, so we are going to redo them)
  • initial work laid out for HVM device driver domains
  • added features to support discard (TRIM or UNMAP) in the block layer along with the emulation of barriers

Linux 3.3
The Linux 3.3 merge window opened a week ago, and we had a similar pattern of patches: documentation cleanups (Thanks to the Document Day), security fixes, fixes in the drivers, driver cleanups, and fixes in the config options.

Feature wise a new driver for doing ioctl to the hypervisor was introduced, more infrastructure changes to improve the netback driver (grant table and skb changes), and making the netback driver be able to work in an HVM guest (the blkback is coming next). The graphic side introduced an DMA type pool code in the TTM backend (used by both radeon and nouveau to fetch/put all of the pages used by the adapter) so that it can work faster and also properly under Xen (the major issues were with 32-bit cards). i915 does not use TTM so it did not need this.

Linux 3.4 and beyond
So what is next? The top things we want to accomplish this year is to:

  • Make ACPI power management work with Xen.
  • Make netback work much much better than it does now!
  • Allow backends and xenstore to run in guests, allowing separate device driver domains
  • Improve the documentation
  • Fix more bugs!

There are other items on this list too, but these ones are the most important right now.

Linux 3.0 – How did we get initial domain (dom0) support there?

About a year ago ( my first patchset that laid the ground work to enable initial domain (dom0) was accepted in the Linux kernel. It was tiny: a total of around 50 new lines of code added. Great, except that it took me seven months to actually get to this stage.

It did not help that the patchset had gone through eight revisions before the maintainer decided that it he could sign off on it. Based on those time-lines I figured the initial domain support would be ready around 2022 🙂

Fast-forward to today and we have initial domain support in Linux kernel with high-performance backends.

So what made it go so much faster (or slower if you have been waiting for this since 2004)? Foremost was the technical challenge of dealing with code that “works” but hasn’t been cleaned up. This is the same problem that OEMs have when they try to upstream their in-house drivers –
the original code “works” but is a headache to maintain and is filled with #ifdef LINUX_VERSION_2_4_3, #else..

To understand why this happens to code, put yourself in the shoes of an engineer who has to deliver a product yesterday. The first iteration is minimal – just what it takes to do the the job. The  next natural step is to improve the code, clean it up, but .. another hot bug lands on the engineer’s lap, and then there isn’t enough time to go back and improve the original code. At the end of day the code is filled with weird edge cases, code paths that seemed right but maybe aren’t anymore, etc.

The major technical challenge was picking up this code years later, figuring out its peculiarities, its intended purposes and how it diverged from its intended goal, and then rewriting it correctly without breaking anything. The fun part is that it is like giving it a new life  – not only can we do it right, but we can also fix all those hacks the original code had. In the end we  (I, Jeremy Fitzhardinge, Stefano Stabellini, and Ian Campbell) ended cleaning up generic code and then alongside providing the Xen specific code. That is pretty neat.

Less technical but also important, was the challenge of putting ourselves in the shoes of a maintainer so that we could write the code to suit the maintainer. I learned this the hard way with the first patchset where it took good seven months for to me finally figure out how the maintainer wanted the code to be written – which was “with as few changes as possible.” While I was writing abstract APIs with backend engines – complete overkill. Getting it right the first time really cut down the time for the maintainer to accept the patches.

The final thing is patience – it takes time to make sure the code is solid. More than often, the third or fourth revision of the code was pretty and right. This meant that for every revision we had to redo the code, retest it, get people to review it  – and that can take quite some time. This had the effect that per merge window (every three months) we tried to upstream only one or maybe two components as we did not have any other pieces of code ready or did not feel they were ready yet. We do reviews now on xen-devel mailing list before submitting it to the Linux Kernel Mailing list (LKML) and the maintainer.

So what changed between 2.6.36 (where the SWIOTLB changes were committed) and 3.0 to make Linux kernel capable of booting as the first guest by the Xen hypervisor?

Around 600 patches. Architecturally the first component was the Xen-SWIOTLB which allowed the DMA API (used by most device drivers) to translate between the guest virtual address and the physical address (and vice versa). Then came in the Xen PCI frontend driver and the Xen PCI library (quite important). The later provided the glue for the PCI API (which mostly deals with IRQ/MSI/MSI-X) to utilize the Xen PCI frontend. This meant that when a guest tried to interrogate the PCI bus for configuration details (which you can see yourself by doing ‘lspci -v’) all those requests would be tunneled through the Xen PCI frontend to the backend. Also requests to set up IRQs or MSIs were tunneled through the Xen PCI backend. In short, we allowed PCI devices to be passed in to the Para Virtualized (PV) guests and function.

Architecture of how device drivers work in Xen DomU PV

The next part was the ACPI code. The ACPI code calls the IRQ layer at bootup to tell it what device has what interrupt (ACPI _PRT tables). When the device is enabled (loaded) it calls PCI API, which in turn calls the IRQ layer, which then calls into the ACPI API. The Xen PCI (which I mentioned earlier) had provided the infrastructure to route the PCI API calls through – so we extended it and added the ACPI call-back. Which meant that it could use the ACPI API instead of the Xen PCI frontend, as appropriate – and viola, the interrupts were now enabled properly.

Architecture of how device drivers plumb through when running in Dom0 or PVonHVM mode.
When 2.6.37 was released, the Linux kernel under the Xen hyper-visor booted! It was very limited (no backends, not all drivers worked, some IRQs never got delivered), but it kind of worked. Much rejoicing happened on Jan 4th 2011 🙂

Then we started cracking on the bugs and adding infrastructure pieces for backends. I am not going to go in details – but there were a lot of patches in many many areas. The first backend to be introduced was the xen-netback, which was accepted in 2.6.39.  And the second one – xen-blkback – was accepted right after that in 3.0.

With Linux 3.0 we now have the major components to be classified as a working initial domain – aka – dom0.

There is still work though – we have not fully worked out the ACPI S3 and S5 support, or the 3D graphics support – but the majority of the needs will be satisfied by the 3.0 kernel.

I skimped here on the under-laying code called paravirt, which Jeremy had been working tirelessly on since 2.6.23 – and which made it all possible – but that is a topic for another article.