Monthly Archives: August 2008

Xen 3.3 Press Release

The official Press Release announcing Xen 3.3 has been posted here. There are many partner quotes in the release from Oracle, Novell, Intel, AMD, Sun, IBM, Fujitsu, Samsung, Neocleus, Citrix, SignaCert, etc and I encourage everyone in the community to take a look. I just got a Google News email with the word “Xen” in the search and there are a lot of public news groups promoting the release (e.g. MarketWatchSys-Con, OStatic, Redmond Developer News, etc).

Congrats again on the great community effort in getting this release out to the world…

Xen 3.3 Feature: Optimized HVM Video Memory Tracking

From Samuel Thibault:

When having a look at how much CPU time is used when an HVM guest is idle, one can notice that the ioemu process used to permanently take something like 7%. This is because ioemu used to keep checking the content of the HVM video RAM for modifications, because setting up a trap on each guest video write would slow guest video operations awfully down.  In Xen 3.3, ioemu requests the hypervisor to track video memory modification.  The hypervisor can do it more efficiently since it has access to the dirty bit that the processor automatically sets in the page table flags on write accesses to pages.  As a result, instead of regularly comparing 8MB of video memory, ioemu just makes a hypercall to read the list of dirty pages.  As an additional optimization, if no modification has occurred for two seconds, the entire video memory write access is dropped until the guest writes to video memory again, hence saving the page table walk itself.

The result is that the CPU time goes down around 0.3%!

Xen 3.3 Feature: PV-GRUB

From Samuel Thibault:

The traditional way to configure a PV guest is to write in the configuration file the path to the kernel/initrd to be loaded.  However, logically enough, these should be on the PV guest disk image, to allow them to be managed by the distribution installed inside the PV guest.  PyGRUB used to act as a “PV bootloader”: it runs in dom0 as root, opens the PV disk image, reads its GRUB menu.lst, presents a GRUB-like menu to let the user choose a kernel which it copies to the dom0 filesystem, it then closes the disk image and eventually tells the domain builder to use that copy.  Such a dom0 root process that parses user-provided data is a potential security breach.

PV-GRUB, on the other hand, is the real GRUB source code recompiled against Mini-OS, and works much more like a usual bootloader: it runs inside the very PV domain that will host the PV guest.  In the PV domain configuration file, one just gives the path to the PV-GRUB kernel.  PV-GRUB will boot inside the PV domain, detect the PV disks and network interfaces of the domain, and just use that to access the PV guests’ menu.lst, use the regular PV console to show the GRUB menu, and again use the PV interface to load the kernel image from the guest disk image.  Some black magic is then used to boot the PV guest kernel from inside the PV domain (see summit slides for the details).  The limitation, however, is that it can not perform a 32/64bit switch: to boot a 32bit (resp. 64bit) PV kernel, a 32bit (resp.  64bit) PV-GRUB is needed.  The bonus features with PV-GRUB is that network boot is also possible, both for providing the menu.lst and kernel/initrd, and works exactly like the regular GRUB.

As a result, PV-GRUB is far more secure than PyGRUB, as is just only uses the very resources that the PV guest will use.

See Summit slides:

Xen 3.3 Feature: HVM Device Model Domain

From Samuel Thibault:

To provide HVM domains with virtual hardware, Xen uses a modified version of qemu, ioemu.  It used to run in dom0 as a root process, since it needs to directly access disks and tap network.  That poses both a problem of security, as the qemu code base was not particularly meant to be safe, and a problem of efficiency, as when an HVM guest performs an I/O operation, the hypervisor gives hand to dom0, which then may not schedule the ioemu process immediately, leading to uneven performances.

In Xen 3.3, ioemu can be run in a Stub Domain (see previous article on Stub Domains).  That means that for each HVM domain there is a dedicated Device Model Domain that processes the I/O requests of the HVM guest.  The Device Model Domain then uses the regular PV interface to actually perform disk and network I/O.  That permits to restrict any harm that ioemu could do to what the regular PV interface enforces.  On the performance point of view, the benefit is twofold: since ioemu runs directly in the same addressing space as Mini-OS, it runs more efficiently: the cost of e.g. select(), clock_gettime(), etc. is reduced a lot; since it runs as a domain, the hypervisor can directly schedule it, which permits to limit the latency of I/O operations at a minimum.  The result is that disk performance gets even closer to native, while network bandwidth gets doubled!

See Summit slides:

Xen 3.3 Feature: Stub Domains

From Samuel Thibault:

Domain 0 running a lot of components like physical device drivers, the domain builder, ioemu device models, PyGRUB, etc. has been worrisome from a security point of view, particularly since most of them run as root, and thus breaches there would potentially be disastrous.  It also poses scalability issues since the hypervisor can not itself schedule them appropriately.  The goal of domain 0 disaggregation is thus to move these components to separate domains: driver domain, builder domain, device model domains, etc.

Mini-OS used to be just a small PV kernel serving as a sample of how a PV guest works.  In Xen 3.3, it has been extended up to being able to run the newlib C library and the lwIP stack, thus providing a basic POSIX environment, including TCP/IP networking.  This permits to quite easily embed an application in a dedicated Xen domain by just recompiling it against that environment.

Everything gets linked together as a kernel which can then just be started like any PV guest kernel.  In Xen 3.3, it is thus now possible to have the device model and grub running in their own domains, as described in further blog posts.

On the technical side, the additional features of Mini-OS include:

– Disk frontend
– FrameBuffer frontend
– FileSystem frontend (to access part of the dom0 filesystem)
– Improved Memory management: read-only memory and Copy on Write for zeroed pages
– Bug fixes!

But the simplicity (and thus the efficiency) of Mini-OS is still kept:

– Single address space (in particular, no kernel/user separation, completely
eliminating system call costs)
– Single CPU
– Threads without preemption for Mini-OS internal use, not exposed at the POSIX layer.

Both C and Caml “hello world” samples are provided to get started with developing a stub domain.

See Summit slides: