Improving event channel scalability

As machines are getting more and more powerful today, people want more from the powerful hardware. From a cloud user’s perspective, it is better to run more virtual machines on a single host. Xen currently supports running hundreds of guests on a single physical machine, but we plan to take it to the next magnitude – to run thousands of guests on a single host. To achieve this goal, we need to tackle several bottlenecks, ranging from the hypervisor to the guest kernel to the user space tool stacks. One of the most notable / tricky bit of all the improvements is certainly the event channel scalability, which should enable Dom0 / driver domains to handle more events than before, thus supporting more guests.

What exactly is event channel? In short, event channel is a primitive provided by Xen for event notifications. A typical guest like Linux can map four kinds of events into event channels: Inter-domain notification, IPI (Inter-processor interrupt), VIRQ (virtual IRQ) and PIRQ (physical IRQ).

Xen models devices into a frontend, which runs in the guests, and a backend, which runs in Dom0 or driver domains. The notifications between frontends and backends are done via inter-domain event channels, which means if we run out of event channels we cannot support any more guests. A typical guest equipped with one virtual network interface (or vif) and one virtual disk (or vbd) consumes four event channels in backend domain, that is one for console, one for xenstore, one for vif and one for vbd. We can come up with a rough estimation that a backend domain can support up to (NR_EVENT_CHANNELS_SUPPORTED / 4) guests, which doesn’t even account for the IPIs, VIRQs and PIRQs a backend domain may have. The NR_EVENT_CHANNELS_SUPPORTED for a 64 bit backend domain is 4096 whilst for a 32 bit backend domain is 1024, yielding less than 1024 guests supported by 64 bit backend domain and even less by 32 bit backend domain. So currently, we can support hundreds of guests at best. It gets worse when a guest has several vifs and vbds.

To improve the scalability of event channel, two designs are proposed, one is 3-level event channel which extends the current 2-level mechanism and the other is a ground-up FIFO-based approach. The 3-level design comes with a prototype (Xen patches, Linux patches) while the FIFO-based event channel ABI is currently under discussion on the mailing list (PDF version of draft V2), the final decision has not been made yet. But it is worth mentioning the undergoing work to the public.

The 3-level event channel design takes advantage of the existing 2-level design, simply extending event channel up to next level, which is straightforward. This design supports up to 256K event channels for a 64 bit backend domain and 32K event channels for a 32 bit backend domain, which in practice is more than enough in the long run. The downsides of this design is that:

  1. it uses global mapping address, which is only 1G
  2. a flat bitmap of events provides no priorities

Solving problem 1 is straightforward. DomU will never use so many event channels, so we just enable 3-level event channel for Dom0 and driver domains. Problem 2 is by design not easily solvable.

The FIFO-based design starts from scratch, which has following advantages over the 3-level design:

  1. event priorities
  2. use FIFO to ensure events fairness
  3. flexible / configurable event channel number for individual guest to reduce memory footprint
  4. consistent ABI across 32 / 64 bit guest / hypervisor

The main downside for FIFO-based design is, well, it is not yet implemented. Certainly many details need to be pinned down (such as avoid using too much global mapping address) and benchmarks need to be conducted. Event handling code is complex by natural and we should be patient to wait for this design to become mature.

From a user’s point of view, either of the two designs should improve event channel scalablity and take Xen’s ability to support guests to next magnitude. 3-level design is proposed to meet the time frame of 4.3 release because of its simplicity. The FIFO-based design is proposed since we are defining a new ABI and our developers think it is high time we start from scratch. The door is still open, feel free to share your views, concerns and suggestions on Xen development list.