Monthly Archives: March 2013

Xen Orchestra: a Web interface for XCP

Maybe you heard few years ago, a project called Xen Orchesta. It was designed to provide a web interface for Xen hypervisor with Xend backend. The project started in 2009, but paused one year later, due to lack of time from the original designer. As you can read on the project website, XO is now re-developed from scratch. But its goal now is to provide an interface for XCP. Why XCP? Because, with XAPI, it offers a full set of features with unmatched possibilities for a Open Source product.

Despite this, XCP lacks of a free, simple and open source interface. That’s why the project reboot. Other interesting projects are now dead (like OpenXenManager, a clone of XenCenter). To avoid this kind of scenario, a clear intention for XO team is to provide a living project: “release often” policy, listening to the community, and deliver commercial support to getting resources needed for the project life. The original team behind XO have created their own company to sustain this durability to XO. Furthermore, XO license is AGPL.

Current state

XO was just rebooted in December. But we want a first version rapidly, at least for testing purpose (architecture validation, feedback, suggestions). That’s why this release is quite light: we focused on global design, picking right technologies for the future. Because we think it’s better to analyse and implement a strong basis rather than doing ugly stuff which can jeopardize the project later. So, we are proud to present you XO “Archlute” which is the first step in web management for XCP.

Continue reading

Request for input: Extended event channel support

The following has been posted on the xen-devel and xen-users mailing lists.

Executive summary

The number of event channels available for dom0 is currently one of the biggest limitations on scaling up the number of VMs which can be created on a single system. There are two alternative implementations we could choose, one of which is ready now, the other of which is potentially technically superior, but will not be ready for the 4.3 release.

The core question we need to ask the community: How important is lifting the event channel scalability limit to 4.3? Will waiting until 4.4 cause a limit in the uptake of the Xen platform?

Read on for a deeper technical description of the issue and the various solutions.
Continue reading

Xen Document Day: March 25th, 2013

We have another Xen Document Day coming next Monday, which is March 25th.

Xen Document Days are for people who care about Xen Documentation and want to improve it. Everybody who can and wants contribute is welcome to join!

For a list of items that need work, check out the community maintained TODO and wishlist. We also have a few beginners items in the list. Of course, you can work on anything you like: the list just provides suggestions.

See you on IRC : #xendocs @ freenode !

Xen @ Linaro Connect Asia 2013

Two weeks ago today was the first day of Linaro Connect Asia: the event was held at the Gold Coast Hotel in Hong Kong and was similar in format and attendance to the previous one in Copenhagen. All the major players in the ARM world came together to speak about the future of the industry and build an healthy Open Source ecosystem for ARM.

This time Ian and I didn’t just listen: we came to present Xen on ARM and to demonstrate the project on stage (a video is available here, but unfortunatelty it is focused on the audience rather than the speakers). Our talk was on Monday, the first day of the conference, and was very well attended: the room was completely full, people were standing and sitting in the corridor to listen to us! Since the talk was Xen specific and rather technical, I am surprised that we managed to raise so much interest. Not even good talks at XenSummit have so many attendees. It is also an indication of how active and vibrant the Linaro community is, managing to get together so many kernel engineers in one room.

We explained the basic ideas behind the original Xen architecture and showed the benefit of some of the more advanced configurations, like disaggregating dom0 and running multiple driver domains. For example one could have a network driver domain with the network card assigned to it, running the network backend. The same can be done for the disk, assigning the SATA controller to a disk driver domain, running the block backend. Driver domains can be restarted independently, without compromising the stability of the system. They improve scalability because they remove some of the load from Dom0. Most importantly, given that they don’t need to be privileged, they improve security and isolation. As a matter of fact, one of the most security oriented operating systems out there, QubesOS, uses them extensively. The ARM port of Xen retains these architectural advantages and renovates the overall design by having a single hypercall interface for 32-bit and 64-bit virtual machines and by removing the distinction between PV and HVM guests. The new kind of guest supported by Xen on ARM provides the advantages of both, without any of the disadvantages: it uses PV interfaces for IO early at boot, it doesn’t require any emulation (no QEMU) and exploits the hardware as much as possible. Xen is running entirely within EL2, or hypervisor mode and it uses second stage translation in the MMU to assign memory to virtual machines. Xen exploits generic timers to export a virtual timer interface to guests and uses the GIC to inject physical and virtual interrupts into virtual machines. Guest operating systems use HVC to issue hypercalls. No ARM virtualization features are left unused.

After the presentation we showed Xen running on the Arndale board and Xen 64-bit running on the ARMv8 emulator: both demos went smoothly and the crowd enjoyed watching us playing tetris on a virtual machine on Xen on ARM.

That was Monday and the activities around Xen were just starting: Mark Heath (XenServer VP of engineering) and Lars Kurth (Xen.org Community manager) gave the keynote the following morning (the slides are available here). They talked about the reasons why Xen is important for Citrix and why Xen and ARM make a very good match. Given how significant is this project for Citrix, Mark exploited the opportunity to announce that Citrix is going to join the Linaro Enterprise Group (LEG) as soon as possible.

A couple of days of meetings with the LEG members and productive hacking sessions in the afternoon followed. Ian managed to boot the first 64-bit Linux guest on a 64-bit hypervisor during that time. The last day reserved a couple of hours for a demo session: we had a Xen booth with the two demos we showed on Monday, plus Xen running natively on the Samsung Chromebook as an added bonus. Many people stopped by to watch the demos, ask questions about Xen, and see it running first hand.

Overall it was a very productive week: I don’t think that anybody in the ARM world could dismiss Xen on ARM as vapoware or uninteresting after Linaro Connect Asia 2013. It’s happening and it’s happening now.

NUMA Aware Scheduling Development Report

Background and Motivation

This blog already hosted a couple of stories about what is going on, in the Xen development community, regarding improving Xen NUMA support. Therefore, if you really are interested in some background and motivation, feel free to check them out:

Long story  short, they say how NUMA is becoming more and more common and that, therefore, it is very important to: (1) achieve a good initial placement, when creating a new VM; (2) have a solution that is both flexible and effective enough to take advantage of that placement during the whole VM lifetime. The former, basically, means: <<When starting a new Virtual Machine, to which NUMA node should I “associate” it with?>>. The latter is more about: <<How hard should the VM be associated to that NUMA node? Could it, perhaps temporarily, run elsewhere?>>.

NUMA Placement and Scheduling

So, here’s the situation: automatic initial placement has been included in Xen 4.2, inside libxl. This means, when a VM is created (of course, if that happens through libxl) a set of heuristics decide on which NUMA node his memory has to be allocated, and the vCPUs of the VM are statically pinned to the pCPUs of such node.
On the other hand, NUMA aware scheduling  has been under development during the last months, and is going to be included in Xen 4.3. This mean, instead of being statically pinned, the vCPUs of the VM will strongly prefer to run on the pCPUs of the NUMA node, but they can run somewhere else as well… And this is what this status report is all about.

NUMA Aware Scheduling Development

The development of this new feature started pretty early in the Xen 4.3 development cycle, and has undergone a couple of major rework along the way. The very first RFC for it dates back to the Xen 4.2 development cycle, and it showed interesting performance already. However, what was decided at the time was to concentrate only on placement, and leave scheduling for the future. After that, v1, v2 and v3 of a patch series entirely focused on NUMA aware scheduling followed. It has been discussed during XenSummit NA 2012, in a talk about NUMA future development in Xen in general (slides here).  While at it, a couple of existing scheduling anomalies of the stock credit scheduler where found and fixed (for instance, the one described here).

Right now, we can say we are almost done. In fact, v3 received positive feedback and is basically what is going to be merged, and so what Xen 4.3 will ship. Actually, there is going to be a v4 (being released on xen-devel right at the same time of this blog post), but it only accommodates very minor changes, and it is 100% functionally equal to v3.

Any Performance Numbers?

Sure thing! Benchmarks similar to the ones already described in the previous blog posts have been performed. More specifically, directly from the cover letter of the v3 of the patch series, here’s what has been done:

I ran the following benchmarks (again):
* SpecJBB is all about throughput, so pinning is likely the ideal
  solution.
* Sysbench-memory is the time it takes for writing a fixed amount
  of memory (and then it is the throughput that is measured). What
  we expect is locality to be important, but at the same time the
  potential imbalances due to pinning could have a say in it.
* LMBench-proc is the time it takes for a process to fork a fixed
  number of children. This is much more about latency than
  throughput, with locality of memory accesses playing a smaller
  role and, again, imbalances due to pinning being a potential
  issue.

This all happened on a 2 node host, where 2 to 10 VMs (2 vCPUs and 960 RAM each) were executing the various benchmarks concurrently. Here they are the results:

 ----------------------------------------------------
 | SpecJBB2005, throughput (the higher the better)  |
 ----------------------------------------------------
 | #VMs | No affinity |  Pinning  | NUMA scheduling |
 |    2 |  43318.613  | 49715.158 |    49822.545    |
 |    6 |  29587.838  | 33560.944 |    33739.412    |
 |   10 |  19223.962  | 21860.794 |    20089.602    |
 ----------------------------------------------------
 | Sysbench memory, throughput (the higher the better)
 ----------------------------------------------------
 | #VMs | No affinity |  Pinning  | NUMA scheduling |
 |    2 |  469.37667  | 534.03167 |    555.09500    |
 |    6 |  411.45056  | 437.02333 |    463.53389    |
 |   10 |  292.79400  | 309.63800 |    305.55167    |
 ----------------------------------------------------
 | LMBench proc, latency (the lower the better)     |
 ----------------------------------------------------
 | #VMs | No affinity |  Pinning  | NUMA scheduling |
 ----------------------------------------------------
 |    2 |  788.06613  | 753.78508 |    750.07010    |
 |    6 |  986.44955  | 1076.7447 |    900.21504    |
 |   10 |  1211.2434  | 1371.6014 |    1285.5947    |
 ----------------------------------------------------

Which, reasoning in terms of %-performance increase/decrease, means NUMA aware
scheduling does as follows, as compared to no-affinity at all and to static pinning:

     ----------------------------------
     | SpecJBB2005 (throughput)       |
     ----------------------------------
     | #VMs | No affinity |  Pinning  |
     |    2 |   +13.05%   |  +0.21%   |
     |    6 |   +12.30%   |  +0.53%   |
     |   10 |    +4.31%   |  -8.82%   |
     ----------------------------------
     | Sysbench memory (throughput)   |
     ----------------------------------
     | #VMs | No affinity |  Pinning  |
     |    2 |   +15.44%   |  +3.79%   |
     |    6 |   +11.24%   |  +5.72%   |
     |   10 |    +4.18%   |  -1.34%   |
     ----------------------------------
     | LMBench proc (latency)         |
     | NOTICE: -x.xx% = GOOD here     |
     ----------------------------------
     | #VMs | No affinity |  Pinning  |
     ----------------------------------
     |    2 |    -5.66%   |  -0.50%   |
     |    6 |    -9.58%   | -19.61%   |
     |   10 |    +5.78%   |  -6.69%   |
     ----------------------------------

The tables show how, when not in overload (where overload=’more vCPUs than pCPUs’), NUMA scheduling is the absolute best. In fact, not only it does a lot better than no-pinning on throughput biased benchmarks, as well as a lot better than pinning on latency biased benchmarks (especially with 6 VMs), it also equals or beats both under adverse circumstances (adverse to NUMA scheduling, i.e., beats/equals pinning in throughput benchmarks, and beats/equals no-affinity on the latency benchmark).

When the system is overloaded, NUMA scheduling scores in the middle, as it could have been expected. It must also be noticed that, when it brings benefits, they are not as huge as in the non-overloaded case. However, this only means that there is still room for more optimization, right?  In some more details, the current way a pCPU is selected for a vCPU that is waking-up, couples particularly bad with the new concept of NUMA node affinity. Changing this is not trivial, because it involves rearranging some locks inside the scheduler code, but is already being worked-on.
Anyway, even with what we have right now, we are overloading the test box by 20% here (without counting Dom0 vCPUs!) and still seeing improvements, which is definitely not bad!

What Else Is Going On?

Well, a lot… To the point that it is probably pointless to try make a list here! We have a NUMA roadmap on our Wiki, which we are trying to keep updated and, more important, to honor and fulfill so, if interested in knowing what will come next, go check it out!

Google hosted Xen Hackathon, May 16-17, Dublin

I am pleased to announce the next Xen Hackathon. The Hackathon will be hosted by the Ganeti team at Google and takes place on May 16-17, 2013 at Google’s offices in Dublin Ireland. You can find the exact address and hotel options at the events page and can also request an invite from there. I wanted to thank Google and in particular Guido Trotter for making the Hackathon happen.

The aim of the Hackathon is to give developers the opportunity to meet face to face to discuss development, coordinate, write code and collaborate with other developers as well as allowing everyone to put names with faces. Given that the Ganeti team will host the event, we expect a strong bias towards cloud integration. This year, we expect a little bit more structure at the Hackathons and will cover Xen on ARM, Xen 4.4 planning as well as any topics that you want to discuss. You can add topics that you want to discuss, code on, work on, … on the Hackathon wiki page. You can also comment on what others plan to do or vote on activities in which you want to participate.

As spaces at the Xen Hackathon are tight and we had people registering in the past and then not attending, we will be asking for a small registration fee of $15 this year.
This fee will be given to Treshhold (an Ireland charity that works to prevent homelessness and campaigns for housing as a right) and Citrix will match the amount to fund a social
event on the 16th. You will need to cover your own travel, accommodation and other costs such as evening meals, etc. We do have limited travel stipends available for individuals who cannot afford to travel: please contact community dot manager at xen dot org if you need to make use of it.

If you are interested in attending the hackathon, please request an invitation by going to the events page.

See you there!