Tag Archives: subproject

Mirage OS v2.0: The new features

The first release of Mirage OS back in December 2013 introduced the prototype of the unikernel concept, which realised the promise of a safe, flexible mechanism to build highly optimized software stacks purpose-built for deployment in the public cloud (see the overview of Mirage OS for some background). Since then, we’ve been hard at work using and extending Mirage for real projects and the community has been steadily growing.

Today, we’re thrilled to announce the release of Mirage OS v2.0! Over the past few weeks the team has been hard at work writing about all the new features in this latest release, which I’ve been busy co-ordinating. Below are summaries of those features and links to in-depth blog posts where you can learn more:

Thomas Leonard's Cubieboard2

Thomas Leonard’s Cubieboard2

ARM device support: While the first version of Mirage was specialised towards conventional x86 clouds, the code generation and boot libraries have now been made portable enough to operate on low-power embedded ARM devices such as the Cubieboard 2. This is a key part of our efforts to build a safe, unified multiscale programming model for both cloud and mobile workloads as part of the Nymote project. We also upstreamed the changes required to the Xen Project so that other unikernel efforts like HalVM or ClickOS can benefit.

Irmin – distributed, branchable storage: Unikernels usually execute in a distributed, disconnection-prone environment (particularly with the new mobile ARM support). We therefore built the Irmin library to explicitly make synchronization easier via a Git-like persistence model that can be used to build and easily trace the operation of distributed applications across all of these diverse environments.

OCaml TLS: The philosophy of Mirage is to construct the entire operating system in a safe programming style, from the device drivers up. This continues in this release with a comprehensive OCaml implementation of Transport Layer Security, the most widely deployed end-to-end encryption protocol on the Internet (and one that is very prone to bad security holes). The series of posts is written by Hannes Mehnert and David Kaloper.

Modularity and communication: Mirage is built on the concept of a library operating system, and this release provides many new libraries to flexibly extend applications with new functionality.

  • Fitting the modular Mirage TCP/IP stack together” by Mindy Preston explains the rather unique modular architecture of our TCP/IP stack that lets you swap between the conventional Unix sockets API, or a complete implementation of TCP/IP in pure OCaml.
  • Vchan: low-latency inter-VM communication channels” by Jon Ludlam shows how unikernels can communicate efficiently with each other to form distributed clusters on a multicore Xen host, by establishing shared memory rings with each other.
  • Modular foreign function bindings” by Jeremy Yallop continues the march towards abstraction by expaining how to interface safely with code written in C, without having to write any unsafe C bindings! This forms the basis for allowing Xen unikernels to communicate with existing libraries that they may want to keep at arm’s length for security reasons.

All the libraries required for these new features are regularly released into the OPAM package manager, so just follow the installation instructions to give them a spin. A release this size probably introduces minor hiccups that may cause build failures, so we very much encourage bug reports on our issue tracker or questions to our mailing lists. Don’t be shy: no question is too basic, and we’d love to hear of any weird and wacky uses you put this new release to! And finally, the lifeblood of Mirage is about sharing and publishing libraries that add new functionality to the framework, so do get involved and open-source your own efforts.

Summary of the gains of Xen oxenstored over cxenstored

This week, we are reblogging this excellent piece from Luis from SUSE. The article came about because of a discussion Luis had at the Linux Foundation Collaboration Summit in Napa, and he decided to write down some basic generals of the xenstore, a review of its first implementation and a summary of oxenstored. The original post is available here, on Luis’ blog.

OXenstored is the default in Xen, only if the ocaml compiler was installed at build time. This means that in many Linux distributions that ship Xen, cxenstored is most likely your default. You can check whether oxenstored is installed by checking whether the oxenstored process runs in your Dom 0.

It all started with a paper: OXenstored – An Efficient Hierarchical and Transactional Database using Functional Programming with Reference Cell Comparisons

First a general description of the xenstore and its first implementation. The xenstore is where Xen stores the information over its systems. It covers dom0 and guests and it uses a filesystem type of layout kind of how we keep a layout of a system on the Linux kernel in sysfs. The original xenstored, which the paper refers to a Cxenstored was written in C. Since all information needs to be stored in a filesystem layout any library or tool that supports designing a tree to have key value store of information should suffice to upkeep the xenstore. The Xen folks decided to use the Trivial Database, tdb, which as it turns out was designed and implemented by the Samba folks for its own database. Xen then has a daemon sitting in the background which listens to reads / write requests onto this database, that’s what you see running in the background if you ‘ps -ef | grep xen’ on dom0. dom0 is the first host, the rest are guests. dom0 uses Unix domain sockets to talk to the xenstore while guests talk to it using the kernel through the xenbus. The code for opening up a connection onto the c version of the xenstore is in tools/xenstore/xs.c and the the call is xs_open(). The first attempt by code will be to open the Unix domain socket with get_handle(xs_daemon_socket()) and if that fails it will try get_handle(xs_domain_dev()), the later will vary depending on your Operating System and you can override first by setting the environment variable XENSTORED_PATH. On Linux this is at /proc/xen/xenbus. All the xenstore is doing is brokering access to the database. The xenstore represents all data known to Xen, we build it upon bootup and can throw it out the window when shutting down, which is why we should just use a tmpfs for it (Debian does, OpenSUSE should be changed to it). The actual database for the C implementation is by default stored under the directory /var/lib/xenstored, the file that has the database there is called tdb. On OpenSUSE that’s /var/lib/xenstored/tdb, on Debian (as of xen-utils-4.3) that’s /run/xenstored/tdb. The C version of the xenstore therefore puts out a database file that can actually be used with tdb-tools (actual package name for Debian and SUSE). xentored does not use libtdb which is GPLv3+, Xen in-takes the tdb implementation which is licensed under the LGPL and carries a copy under tools/xenstore/tdb.c. Although you shouldn’t be using tdb-tools to poke at the database you can still read from it using these tools, you can read the entire database as follows:

 tdbtool /run/xenstored/tdb dump

The biggest issue with the C version implementation and relying on tdb is that you can live lock it if you have a guest or any entity doing short quick accesses onto the xenstore. We need Xen to scale though and the research and development behind oxenstored was an effort to help with that. What follows next is my brain dump of the paper. I don’t get into the details of the implementation because as can be expected I don’t want to read OCaml code. Keep in mind that if I look for a replacement I’m looking also for something that Samba folks might want to consider.

OXenstored has the following observed gains:

  • 1/5th the size in terms of line of code in comparison to the C xenstored
  • better performance increasing support for the number of guests, it supports 3 times number of guests for an upper limit of 160 guests

The performance gains come from two things:

  • how it deals with transactions through an immutable prefix tree. Each transaction is associated with a triplet (T1, T2, p) where T1 is the root of the database just before a transaction, T2 is the local copy of the database with all updates made by the transaction made up to that point, p is the path to the furthest node from the root T2 whose subtree contains all the updates made by the transaction up that point.
  • how it deals with sharing immutable subtrees and uses ‘reference cell equality’, a limited form of pointer equality, which compares the location of values instead of the values themselves. Two values are shared if they share the same location. Functional programming languages enforce that multiple copies of immutable structures share the same location in memory. oxenstored takes avantage of this functional programming feature to design trie library which enforces sharing of  subtrees as much as possible. This lets them simpilfy how to determine and merge / coalesce concurrent transactions.

The complexity of the algorithms used by oxenstored is confined only to the length of the path, which is rarely over 10. This gives predictable performance regardless of the number of guests present.

Five Chances to Hear About the Xen Project in the Next Two Weeks

Over the next two weeks, there are no less than five great opportunities to hear about the Xen Project.  These include:

The Linux Link Tech Show on May 29, 2013

Xen Project Evangelist Russ Pavlicek will be a guest of The Linux Link Tech Show crew.  This weekly webcast is streamed live at 8:30 PM EDT (GMT-4) worldwide, with downloadable podcasts available soon after the show is over.  They also have a lively IRC during the broadcast, so join in while you listen and maybe answer some questions others may have about Xen!

LinuxCon Japan: “10 Years of Xen and Beyond” on May 30

Xen Project Community Manager Lars Kurth talks about the history and future of the project at LinuxCon Japan.  LinuxCon Japan is a great Linux Foundation event for the Asian region.  And Lars will be available frequently on the expo floor, so stop by and say hello!

Texas Linux Fest: “How to (Almost) Kill a Successful Project and then Bring It Back to Life: Lessons Learned from the Xen Project” on June 1.

Russ speaks to the Texas Linux Fest crowd about how Xen’s early success was nearly overcome by disconnection with the community, and how the problems were addressed.  Texas Linux Fest continues to create an excellent event for the south central US.  If you are in the Austin, TX area, give yourself a treat and sign up!

Southeast Linuxfest: June 7 & 9

Russ reprises the talk from Texas to the North Carolina crowd on Friday, then finishes on Sunday with “Xen Project: a 10th Anniversary Status Report“.  SELF has a long history of delivering high quality conferences to those in the US southeast.  If you can make it to Charlotte, NC, make sure you stop in on the sessions and talk to Russ.

We really hope that most folks can attend one of the sessions, or at least download the audio, video, or slides as they are made available.

Remember that these and many other upcoming events can be found in the XenProject.org Events Calendar.  Check there often to see who will be speaking in a venue near you.

 

Welcome Home: Xen moves to a new home built by CloudAccess.net

In April, Xen unveiled a new community site at xenproject.org. Xen Project leaders worked closely with CloudAccess.net in the development of their new online home, built using Joomla.

CloudAccess.net is the official host of demo.joomla.org and one of the countless cloud-based companies that benefits from Xen technologies. Thousands of users launch free trials of the Joomla CMS through the company’s  platform every month and the Xen Hypervisor is at the center of it all. It’s the critical component that provisions compute and allows for Joomla application virtualization.

When the Xen Project needed a new, more collaborative home on the web, project leaders ultimately decided to build using Joomla. Lars Kurth, Community Manager, said that,  “in a nutshell, the Joomla back-end is a lot easier to use and to get started with than Drupal. That makes it ideal for a community site where you want volunteers to be able to do contribute.” Lars also added that “Joomla is relatively intuitive when you need to figure out how to get stuff done.”

Mark Hinkle said that “CloudAccess.net is in the unique position of having a strong hosting presence combined with an intimate knowledge of the Joomla! CMS + Application Framework, developer ecosystem and the Joomla! open source community.” He further commented that “we chose CloudAccess.net for numerous projects because of their broad knowledge and their dedication to supporting the underlying open source community as well as their skill at developing interactive websites that foster participation from the users of those sites.”

CloudAccess.net CEO Gary Brooks commented that “we were very excited to work with Xen on this project. We obviously share similar open source values and we both contribute to open source collaborative communities. Without Xen, our company wouldn’t exist. We’re defining what we think ‘cloud’ means, and Xen produces the technologies that drive our highly available, scalable applications. We are the prime example of what’s possible with Xen, a poster child of sorts.”

About the author: Ryan Bernstein is the Chief Operating Officer at CloudAccess.net and an adjunct professor of composition and public speaking at Northwestern Michigan College in Traverse City, Michigan.

Xen.org 2011 Year in Review

It truly was an amazing year for Xen.org! The key highlights included Dom0 supporting going into mainline Linux kernel, Project Kronos, and renewed focus  Xen for ARM. All three of these projects are examples of standing on the shoulders of giants.

In 2011, Xen.org welcomed Lars Kurth as our community manager. Lars’ impact can be seen most notably in a formalized governance model, a new Xen wiki, and numerous events that Xen.org held and attended (described below).

Technology

Xen.org Events

Xen.org at External Community Events

Ian Pratt gave his thoughts on Xen past, present and future at Xen Summit Asia 2011. His slides and a video are available: (slides: http://www.slideshare.net/xen_com_mgr/xenorg-the-past-the-present-and-exciting-future video: http://vimeo.com/33056576)

Linux 3.0 – How did we get initial domain (dom0) support there?

About a year ago (https://lkml.org/lkml/2010/6/4/272) my first patchset that laid the ground work to enable initial domain (dom0) was accepted in the Linux kernel. It was tiny: a total of around 50 new lines of code added. Great, except that it took me seven months to actually get to this stage.

It did not help that the patchset had gone through eight revisions before the maintainer decided that it he could sign off on it. Based on those time-lines I figured the initial domain support would be ready around 2022 :-)

Fast-forward to today and we have initial domain support in Linux kernel with high-performance backends.

So what made it go so much faster (or slower if you have been waiting for this since 2004)? Foremost was the technical challenge of dealing with code that “works” but hasn’t been cleaned up. This is the same problem that OEMs have when they try to upstream their in-house drivers –
the original code “works” but is a headache to maintain and is filled with #ifdef LINUX_VERSION_2_4_3, #else..

To understand why this happens to code, put yourself in the shoes of an engineer who has to deliver a product yesterday. The first iteration is minimal – just what it takes to do the the job. The  next natural step is to improve the code, clean it up, but .. another hot bug lands on the engineer’s lap, and then there isn’t enough time to go back and improve the original code. At the end of day the code is filled with weird edge cases, code paths that seemed right but maybe aren’t anymore, etc.

The major technical challenge was picking up this code years later, figuring out its peculiarities, its intended purposes and how it diverged from its intended goal, and then rewriting it correctly without breaking anything. The fun part is that it is like giving it a new life  – not only can we do it right, but we can also fix all those hacks the original code had. In the end we  (I, Jeremy Fitzhardinge, Stefano Stabellini, and Ian Campbell) ended cleaning up generic code and then alongside providing the Xen specific code. That is pretty neat.

Less technical but also important, was the challenge of putting ourselves in the shoes of a maintainer so that we could write the code to suit the maintainer. I learned this the hard way with the first patchset where it took good seven months for to me finally figure out how the maintainer wanted the code to be written – which was “with as few changes as possible.” While I was writing abstract APIs with backend engines – complete overkill. Getting it right the first time really cut down the time for the maintainer to accept the patches.

The final thing is patience – it takes time to make sure the code is solid. More than often, the third or fourth revision of the code was pretty and right. This meant that for every revision we had to redo the code, retest it, get people to review it  – and that can take quite some time. This had the effect that per merge window (every three months) we tried to upstream only one or maybe two components as we did not have any other pieces of code ready or did not feel they were ready yet. We do reviews now on xen-devel mailing list before submitting it to the Linux Kernel Mailing list (LKML) and the maintainer.

So what changed between 2.6.36 (where the SWIOTLB changes were committed) and 3.0 to make Linux kernel capable of booting as the first guest by the Xen hypervisor?

Around 600 patches. Architecturally the first component was the Xen-SWIOTLB which allowed the DMA API (used by most device drivers) to translate between the guest virtual address and the physical address (and vice versa). Then came in the Xen PCI frontend driver and the Xen PCI library (quite important). The later provided the glue for the PCI API (which mostly deals with IRQ/MSI/MSI-X) to utilize the Xen PCI frontend. This meant that when a guest tried to interrogate the PCI bus for configuration details (which you can see yourself by doing ‘lspci -v’) all those requests would be tunneled through the Xen PCI frontend to the backend. Also requests to set up IRQs or MSIs were tunneled through the Xen PCI backend. In short, we allowed PCI devices to be passed in to the Para Virtualized (PV) guests and function.

Architecture of how device drivers work in Xen DomU PV

The next part was the ACPI code. The ACPI code calls the IRQ layer at bootup to tell it what device has what interrupt (ACPI _PRT tables). When the device is enabled (loaded) it calls PCI API, which in turn calls the IRQ layer, which then calls into the ACPI API. The Xen PCI (which I mentioned earlier) had provided the infrastructure to route the PCI API calls through – so we extended it and added the ACPI call-back. Which meant that it could use the ACPI API instead of the Xen PCI frontend, as appropriate – and viola, the interrupts were now enabled properly.

Architecture of how device drivers plumb through when running in Dom0 or PVonHVM mode.
When 2.6.37 was released, the Linux kernel under the Xen hyper-visor booted! It was very limited (no backends, not all drivers worked, some IRQs never got delivered), but it kind of worked. Much rejoicing happened on Jan 4th 2011 :-)

Then we started cracking on the bugs and adding infrastructure pieces for backends. I am not going to go in details – but there were a lot of patches in many many areas. The first backend to be introduced was the xen-netback, which was accepted in 2.6.39.  And the second one – xen-blkback – was accepted right after that in 3.0.

With Linux 3.0 we now have the major components to be classified as a working initial domain – aka – dom0.

There is still work though – we have not fully worked out the ACPI S3 and S5 support, or the 3D graphics support – but the majority of the needs will be satisfied by the 3.0 kernel.

I skimped here on the under-laying code called paravirt, which Jeremy had been working tirelessly on since 2.6.23 – and which made it all possible – but that is a topic for another article.