As promised, here is the poll for the security discussion. As a reminder, the purpose of this poll is mainly to see where people’s attitudes are with respect to the various options, so that we can move the discussion forward towards a conclusion. If you have any interested at all in the outcome, please make your voice heard.
The poll will not be secret. You may fill out the poll anonymously, but if you do, your vote will be given less weight (to avoid ballot stuffing). We don’t necessarily plan on publishing the individual poll responses, but we may do so if we think it would be helpful.
Because of the summer holidays, we will keep the poll open for two weeks; we will tabulate the results Monday, August 20.
The poll can be found here:
Thank you for your time.
The Xen Security team recently disclosed a vulnerability, Xen Security Advisory 7 (CVE-2012-0217), which would allow guest administrators to escalate to hypervisor-level privileges. The impact is much wider than Xen; many other operating systems seem to have the same vulnerability, including NetBSD, FreeBSD, some versions of Microsoft Windows (including Windows 7).
So what was the vulnerability? It has to do with a subtle difference in the way in which Intel processors implement error handling in their version of AMD’s
SYSRET instruction. The
SYSRET instruction is part of the x86-64 standard defined by AMD. If an operating system is written according to AMD’s spec, but run on Intel hardware, the difference in implementation can be exploited by an attacker to write to arbitrary addresses in the operating system’s memory. This blog will explore the technical details of the vulnerability.
One of the goals for the 4.2 release is for
xl to have feature parity with
xm for the most important functions. But along the way, we’ve also been adding a number of improvements to the interface as well. One of the ways in which xl has changed and improved the interface is in passing through pci devices directly to VMs.
A basic device pass-through review
As you may know, Xen has for several years had the ability to “pass through” a pci device to a guest, allowing that guest to control the device directly. This has several applications, including driver domains and increased performance for graphics or networking.
To pass through a device, you need to find out its BDF (Bus, Device, Function). A BDF now consists of three or four numbers in this format:
DDDD is a 4-digit hex for the PCI domain. This is optional (if not included, it will be assumed to be 0000).
bb is a 2-digit hex of the PCI bus number
dd is a 2-digit hex of the PCI device number
f is a 1-digit decimal of the PCI function number
Among the more unique features of Xen 4.2 is a feature called cpupools, designed and implemented by JÃ¼rgen GroÃŸ at Fujitsu. At its core it’s a simple idea, but one that allows it to be a flexible and powerful solution to a number of different problems.
The core idea behind cpupools is to divide the physical cores on the machine into different pools. Each of these pools has an entirely separate cpu scheduler, and can be set with different scheduling parameters. At any time, a given logical cpu can be assigned to only one of these pools (or none). A VM is assigned to one pool at a time, but can be moved from pool to pool.
There are a number of things one can do with this functionality. Suppose you are a hosting or cloud provider, and you have a number of customers who have multiple VMs with you. Instead of selling based on CPU metering, you want to sell access to a fixed number of cpus for all of their VMs: e.g. a customer with 6 single-vcpu VMs might buy 2 cores worth of computing space which all of the VMs share.
You could solve this problem by using cpu masks to pin all of the customer’s vcpus to a single set of cores. However, cpu masks do not work well with the scheduler’s weight algorithm — the customer wont’ be able to specify that VM A should get twice the cpu as VM B. Solving the weight issue in a general way is very difficult, since VMs can have any combination of overlapping cpu masks. Furthermore, this extra complication would be there for all users of the credit algorithm, regardless of whether they use this particular mode or not.
Xen 4.2 will contain two new scheduling parameters for the credit1 scheduler which significantly increase its confurability and performance for cloud-based workloads:
ratelimit_us. This blog post describes what they do, and how to configure them for best performance.
The timeslice for the credit1 has historically been fixed at 30ms. This is actually a fairly long time — it’s great for computationally-intensive workloads, but not so good for latency-sensitive workloads, particularly ones involving network traffic or audio.
Xen 4.2 introduces the
tslice_ms parameter, which sets the timeslice of the scheduler in milliseconds. This can be set either using the Xen command-line option,
sched_credit_tslice_ms, or by using the new scheduling parameter interface to
# xl sched-credit -t [n]
One of the fun things about a hackathon is the chance to get everyone together in a room and just talk about crazy ideas you might try at some point in the future.
One of the advantage that a certain competing virtualization technology has over Xen is that you don’t have to reboot to start using it. It’s not that big of a thing, but if you just want to play around with VMs, the additional step of rebooting and probably having to muck about with a grub entry makes it pretty certain that casual users will prefer our competition.
Wouldn’t it be great, someone said, if you could just do “insmod xen” in a running kernel, and have it hoist up the kernel (which is currently running on bare metal), put Xen underneath, and make the currently running kernel into domain 0?
The idea sounds pretty crazy at first, but after some examination, it’s actually quite do-able. In fact, there’s precedent: Windows 2008, apparently, does that when booting into Hyper-V. It may involve a certain amount of switching from bare metal code to PV code; but there’s precedent for that too, in the form of SMP alternatives.
One thing that it would depend upon is another project we’ve been kicking around for a year or so now, that being running dom0 in an HVM container. That would greatly reduce the amount of PVOPS necessary to run Linux as dom0, making the “hoist” a lot cleaner.
We have a lot of work to do before this can become a priority, but it’s a project that’s attractive enough that I’m sure someone will pick it up in due time, at which point there’s no technical reason that Xen can’t be as convenient for casual users to being using as any other virtualization technology out there.