Tag Archives: RC2

Benchmarking the new PV ticketlock implementation

This post written collaboratively by Attilio Rao and George Dunlap

Operating systems are generally written assuming that they are in direct control of the hardware. So when we run operating systems in virtual machines, where they share the hardware with other operating systems, this can sometimes cause problems. One of the areas addressed by a recently proposed patch series is the problem of spinlocks on a virtualized system. So what exactly is the problem here, and how does the patch solve it? And what is the effect of the patch when the kernel is running natively?

Spinlocks and virtualization

Multiprocessor systems need to be able to coordinate access to important data, to make sure that two processors don’t attempt to modify things at the same time. The most basic way to do this is with a spinlock. Before accessing data, the code will attempt to grab the spinlock. If code running on another processor is holding the spinlock, the code on this processor will “spin” waiting for the lock to be free, at which point it will continue. Because those waiting for the spinlock are doing “busy-waiting”, code should try to hold the spinlock only for relatively short periods of time.

Continue reading

Dom0 Memory — Where It Has Not Gone

If you are upgrading domain 0 Linux kernel from a non-pvops (classic, 2.6.18/2.6.32/etc.) kernel to a pvops one (3.0 or later), you may find that the amount of free memory inside dom0 has decreased significantly.  This is because of changes in the way kernel handles the memory given to it by Xen.  With some updates and configuration changes, the “lost” memory can be recovered.

tl;dr: If you previously had ‘dom0_mem=2G’ as a command line option to Xen, change this to ‘dom0_mem=2G,max:2G’.  If that didn’t help, read on.

Continue reading

Linux 3.3!

On March 18th, Linux 3.3 was released and it featured a number of interesting Xen related features.

  • Re-engineered how tools can perform hypercalls – by using a standard interface (/dev/xen/privcmd instead of using /proc/xen/privcmd)
  • Backends (netback, blkback) can now function in HVM mode. This means that a device driver domain can be in charge of a device (say network) and a subset of the network (netback). What is exciting about this it allows for security by isolation – so if one domain is compromised it does not affect the other domains. Both Qubes and NSA Research Center have been focusing on this functionality and it is exciting to see components of this goal taking shape!
  • Continue reading

Xen in Linux 3.2, 3.3 and Beyond

Linux 3.2
Linux 3.2 was released on Jan 4th and compared to earlier kernel releases, this one was very focused on fixing bugs reported by the community.

Thank you!!

Issues that caused lots of bug reports were:

  • The xen-platform-pci module (used for HVM guest to enable PV drivers) was frequently not included in the installer (that is now fixed by making it built in the kernel and fixing the installer builders).
  • ‘xl list’ vs ‘xm list’ discrepancy: this was caused by the guest not having the memory in the “right” sections.
  • Others were related to issues found with Amazon EC2, and bug fixes from Linux distributions (Debian, Canonical, Fedora, Red Hat, Citrix  and Oracle).
  • We also fixed boot issues for Dell machines.

We are all quite grateful for community reporting these issues! For reported issues, it might take some time to find the root cause. We do want to get them all fixed and hope that you will be patient with us.

On the “feature” side we

  • cleaned the code
  • added support for big machines with more than 256 PCI devices
  • added kexec support for PVonHVM (which sadly broke Amazon EC2, so we are going to redo them)
  • initial work laid out for HVM device driver domains
  • added features to support discard (TRIM or UNMAP) in the block layer along with the emulation of barriers

Linux 3.3
The Linux 3.3 merge window opened a week ago, and we had a similar pattern of patches: documentation cleanups (Thanks to the Document Day), security fixes, fixes in the drivers, driver cleanups, and fixes in the config options.

Feature wise a new driver for doing ioctl to the hypervisor was introduced, more infrastructure changes to improve the netback driver (grant table and skb changes), and making the netback driver be able to work in an HVM guest (the blkback is coming next). The graphic side introduced an DMA type pool code in the TTM backend (used by both radeon and nouveau to fetch/put all of the pages used by the adapter) so that it can work faster and also properly under Xen (the major issues were with 32-bit cards). i915 does not use TTM so it did not need this.

Linux 3.4 and beyond
So what is next? The top things we want to accomplish this year is to:

  • Make ACPI power management work with Xen.
  • Make netback work much much better than it does now!
  • Allow backends and xenstore to run in guests, allowing separate device driver domains
  • Improve the documentation
  • Fix more bugs!

There are other items on this list too, but these ones are the most important right now.

Linux 3.0 – How did we get initial domain (dom0) support there?

About a year ago (https://lkml.org/lkml/2010/6/4/272) my first patchset that laid the ground work to enable initial domain (dom0) was accepted in the Linux kernel. It was tiny: a total of around 50 new lines of code added. Great, except that it took me seven months to actually get to this stage.

It did not help that the patchset had gone through eight revisions before the maintainer decided that it he could sign off on it. Based on those time-lines I figured the initial domain support would be ready around 2022 🙂

Fast-forward to today and we have initial domain support in Linux kernel with high-performance backends.

So what made it go so much faster (or slower if you have been waiting for this since 2004)? Foremost was the technical challenge of dealing with code that “works” but hasn’t been cleaned up. This is the same problem that OEMs have when they try to upstream their in-house drivers –
the original code “works” but is a headache to maintain and is filled with #ifdef LINUX_VERSION_2_4_3, #else..

To understand why this happens to code, put yourself in the shoes of an engineer who has to deliver a product yesterday. The first iteration is minimal – just what it takes to do the the job. The  next natural step is to improve the code, clean it up, but .. another hot bug lands on the engineer’s lap, and then there isn’t enough time to go back and improve the original code. At the end of day the code is filled with weird edge cases, code paths that seemed right but maybe aren’t anymore, etc.

The major technical challenge was picking up this code years later, figuring out its peculiarities, its intended purposes and how it diverged from its intended goal, and then rewriting it correctly without breaking anything. The fun part is that it is like giving it a new life  – not only can we do it right, but we can also fix all those hacks the original code had. In the end we  (I, Jeremy Fitzhardinge, Stefano Stabellini, and Ian Campbell) ended cleaning up generic code and then alongside providing the Xen specific code. That is pretty neat.

Less technical but also important, was the challenge of putting ourselves in the shoes of a maintainer so that we could write the code to suit the maintainer. I learned this the hard way with the first patchset where it took good seven months for to me finally figure out how the maintainer wanted the code to be written – which was “with as few changes as possible.” While I was writing abstract APIs with backend engines – complete overkill. Getting it right the first time really cut down the time for the maintainer to accept the patches.

The final thing is patience – it takes time to make sure the code is solid. More than often, the third or fourth revision of the code was pretty and right. This meant that for every revision we had to redo the code, retest it, get people to review it  – and that can take quite some time. This had the effect that per merge window (every three months) we tried to upstream only one or maybe two components as we did not have any other pieces of code ready or did not feel they were ready yet. We do reviews now on xen-devel mailing list before submitting it to the Linux Kernel Mailing list (LKML) and the maintainer.

So what changed between 2.6.36 (where the SWIOTLB changes were committed) and 3.0 to make Linux kernel capable of booting as the first guest by the Xen hypervisor?

Around 600 patches. Architecturally the first component was the Xen-SWIOTLB which allowed the DMA API (used by most device drivers) to translate between the guest virtual address and the physical address (and vice versa). Then came in the Xen PCI frontend driver and the Xen PCI library (quite important). The later provided the glue for the PCI API (which mostly deals with IRQ/MSI/MSI-X) to utilize the Xen PCI frontend. This meant that when a guest tried to interrogate the PCI bus for configuration details (which you can see yourself by doing ‘lspci -v’) all those requests would be tunneled through the Xen PCI frontend to the backend. Also requests to set up IRQs or MSIs were tunneled through the Xen PCI backend. In short, we allowed PCI devices to be passed in to the Para Virtualized (PV) guests and function.

Architecture of how device drivers work in Xen DomU PV

The next part was the ACPI code. The ACPI code calls the IRQ layer at bootup to tell it what device has what interrupt (ACPI _PRT tables). When the device is enabled (loaded) it calls PCI API, which in turn calls the IRQ layer, which then calls into the ACPI API. The Xen PCI (which I mentioned earlier) had provided the infrastructure to route the PCI API calls through – so we extended it and added the ACPI call-back. Which meant that it could use the ACPI API instead of the Xen PCI frontend, as appropriate – and viola, the interrupts were now enabled properly.

Architecture of how device drivers plumb through when running in Dom0 or PVonHVM mode.
When 2.6.37 was released, the Linux kernel under the Xen hyper-visor booted! It was very limited (no backends, not all drivers worked, some IRQs never got delivered), but it kind of worked. Much rejoicing happened on Jan 4th 2011 🙂

Then we started cracking on the bugs and adding infrastructure pieces for backends. I am not going to go in details – but there were a lot of patches in many many areas. The first backend to be introduced was the xen-netback, which was accepted in 2.6.39.  And the second one – xen-blkback – was accepted right after that in 3.0.

With Linux 3.0 we now have the major components to be classified as a working initial domain – aka – dom0.

There is still work though – we have not fully worked out the ACPI S3 and S5 support, or the 3D graphics support – but the majority of the needs will be satisfied by the 3.0 kernel.

I skimped here on the under-laying code called paravirt, which Jeremy had been working tirelessly on since 2.6.23 – and which made it all possible – but that is a topic for another article.

Xen celebrates full Dom0 and DomU support in Linux 3.0

This is a very short blog post as both Wim Coekaert and Ewan Mellor beat me by some time in publishing this great news: I was too busy traveling and celebrating. The fantastic news is that Linux 3.0 will have everything necessary to run Xen as both as a management domain and as a Xen guest. You can find more information in the following two posts

A big thank you to everybody in the community who helped make this happen. To conclude let me quote from Wim’s blog post, as it captures really well what this means for our community!

“All this means that every single bit of support needed in Linux to work perfectly well with Xen is -in- the mainline kernel tree. I’ve heard over the last few years, competitors use “There is no Xen support in Linux” as a tagline to create fud with the Xen userbase and promote alternatives. Well, it’s all there people. As Linux evolves, now, within that code base, the Linux/Xen bits will evolve at the same rate without separate patch trees and big chunks of code to carry along. This is great, Xen is a great hypervisor with capabilities and features that one cannot achieve in a non-true hypervisor architecture.