Monthly Archives: December 2013

2013 : A Year to Remember

2013 has been a year of changes for the Xen Community. I wanted to share my five personal highlights of the year. But before I do this, I wanted to thank everyone who contributed to the Xen Project in 2013 and the years before. Open Source is about bringing together technology and people : without your contributions, the Xen Project would not be a thriving and growing open source project.

Xen Project joins Linux Foundation

The biggest community story of 2013, was the move of Xen to the Linux Foundation in April. For me, this journey started in December 2011, when I won in-principle agreement from Citrix to find a neutral, non-profit home for Xen. This took longer than I hoped: even when the decision was made to become a Linux Foundation Collaborative project, it took many months of hard work to get everything off the ground. Was it worth it? The answer is a definite yes: besides all the buzz and media interest in April 2013, interest in and usage of Xen has increased in the remainder of 2013. The Xen Project became a first class citizen within the open source community, which it was not really before.

Wiki Page Visits

Monthly visits by users to the Xen Project wiki doubled after moving Xen to the Linux Foundation.

Of course, the ripples of this change will be felt for many years to come. Some of them, are covered in the other 4 highlights of 2013. I personally believe that the Xen Project Advisory Board (which is made up of 14 major companies that fund the project), will have a positive impact on the community going forward. This will become apparent next year, when initiatives that are funded by the Advisory Board – such as an independently hosted test infrastructure, more coordinated marketing and PR, growing the Xen talent pool and many others – will kick into gear.
Continue reading

Where Would You Like to See the Next Xen Project User Summit Held?

In 2013, we held the first major Xen event aimed specifically at users: the Xen Project User Summit. In 2014, we want to do it again — but where and when?

The Xen Project wants to hold its second Xen Project User Summit.  We’d like to hold it somewhere which is accessible by a large percentage of our user community.  And we’d like to schedule it at a time which makes sense, possibly in coordination with some existing conference.

We need your help to pick the time and place.  Give us your preferences in a very quick 2 minute survey found here:

https://www.surveymonkey.com/s/YJQCHJ6

It’s very quick and easy to do.  And you may just find that the next User Summit is too convenient for you to pass up.

What is the ARINC653 Scheduler?

The Xen ARINC 653 scheduler is a real time scheduler that has been in Xen since 4.1.0.  It is a cyclic executive scheduler with a specific usage in mind, so unless one has aviation experience they are unlikely to have ever encountered it.

The scheduler was created and is currently maintained by DornerWorks.

Background

The primary goal of the ARINC 653 specification [1] is the isolation or partitioning of domains.  The specification goes out of its way to prevent one domain from adversely affecting any other domain, and this goal extends to any contended resource, including but not limited to I/O bandwidth, CPU caching, branch prediction buffers, and CPU execution time.

This isolation is important in aviation because it allows applications at different levels of certification (e.g. Autopilot – Level A Criticality, In-Flight Entertainment – Level E Criticality, etc…) to be run in different partitions (domains) on the same platform.  Historically to maintain this isolation each application had its own separate computer and operating system, in what was called a federated system.  Integrated Modular Avionics (IMA) systems were created to allow multiple applications to run on the same hardware.  In turn, the ARINC653 specification was created to standardize an Operating System for these platforms.  While it is called an operating system and could be implemented as such, it can also be implemented as a hypervisor running multiple virtual machines as partitions.  Since the transition from federated to IMA systems in avionics closely mirrors the transition to virtualized servers in the IT sector, the latter implementation seems more natural.

Beyond aviation, an ARINC 653 scheduler can be used where temporal isolation of domains is a top priority, or in security environments with indistinguishability requirements, since a malicious domain should be unable to extract information through a timing side-channel.  In other applications, the use of an ARINC 653 scheduler would not be recommended due to the reduced performance.

Scheduling Algorithm

The ARINC 653 scheduler in Xen provides the groundwork for the temporal isolation of domains from each other. The domain scheduling algorithm itself is fairly simple:  a fixed predetermined list of domains is repeatedly scheduled with a fixed periodicity resulting in a complete and, most importantly, predictable schedule.  The overall period of the scheduler is know as a major frame, while the individual domain execution windows in the schedule are know as minor frames.

Major_Minor_Frame

As an example, suppose we have 3 domains all with periods of 5, 6, 10 ms and worst case running times respectively of 1 ms, 2 ms, and 3 ms.  The major frame is set to the least common multiple of these periods (30 ms) and minor frames are selected so that the period, runtime, and deadline constraints are met.  One resulting schedule is shown below, though there are other possibilities.

ExampleSchedule

The ARINC 653 scheduler is only concerned with the scheduling of domains. The scheduling of real-time processes within a domain is performed by that domain’s process scheduler.  In a compliant ARINC 653 system, these processes are scheduled using a fixed priority scheduling algorithm, but if ARINC 653 compliance is not a concern any other process scheduling method may be used.

Using the Scheduler

Directions for using the scheduler can be found on the Xen wiki at ARINC653 Scheduler. When using the scheduler, the most obvious effect will be that the cpu usage and execution windows for each domain will be fixed regardless of whether the domain is performing any work.

Currently multicore operation of the scheduler is not supported.  Extending the scheduling algorithm to multiple cores is trivial, but the isolation of domains in a multicore system requires a number of mitigation techniques not required in single-core systems.[2]

References

[1] ARINC Specification 653P1-3, “Avionics Application Software Standard Interface Part 1 – Required Services” November 15, 2010

[2] EASA.2011/6 MULCORS – Use of Multicore Processors in airborne systems

Announcing the 1.0 release of Mirage OS

We’re very pleased to announce the release of Mirage OS 1.0. This is the first major release of Mirage OS and represents several years of development, testing and community building. You can get started by following the install instructions and creating your own webserver to host a static website! Also check out the release notes and download page.

What is Mirage OS and why is it important?

Most applications that run in the cloud are not optimized to do so. They inherently carry assumptions about the underlying operating system with them, including vulnerabilities and bloat.

Compartmentalization of large servers into smaller ‘virtual machines’ has enabled many new businesses to get started and achieve scale. This has been great for new services but many of those virtual machines are single-purpose and yet they contain largely complete operating systems which typically run single applications like web-servers, load balancers, databases, mail servers and similar services. This means a large part of the footprint is unused and unnecessary, which is both costly due to resource usage (RAM, disk space etc) and a security risk due to the increased complexity of the system and the larger attack surface.

Cloud OS Diagram

On the left, you see a typical application stack run in the cloud today. Cloud Operating systems such as MirageOS remove the Operating System and replace it with a Language Runtime that is designed to cooperate with the Hypervisor.

Mirage OS is a Cloud Operating System which represents an approach where only the necessary components of the operating system are included and compiled along with the application into a ‘unikernel’. This results in highly efficient and extremely lean ‘appliances’, with the same or better functionality but a much smaller footprint and attack surface. These appliances can be deployed directly to the cloud and embedded devices, with the benefits of reduced costs and increased security and scalability.

Some example use cases for Mirage OS include: (1) A lean webserver, for example the openmirage.org, website is about 1MB including all content, boots in about 1 second and is hosted on Amazon EC2. (2) Middle-box applications such as small OpenFlow switches for tenants in a cloud-provider. (3) Easy reuse of the same code and toolchain that create cloud appliances to target the space and memory constrained ARM devices.

How does Mirage OS work?

Mirage OS works by treating the Xen hypervisor as a stable hardware platform and using libraries to provide the services and protocols we expect from a typical operating system, e.g. a networking stack. Application code is developed in a high-level functional programming language OCaml on a desktop OS such as Linux or Mac OSX, and compiled into a fully-standalone, specialized unikernel. These unikernels run directly on Xen hypervisor APIs. Since Xen powers most public clouds such as Amazon EC2, Rackspace Cloud, and many others, Mirage OS lets your servers run more cheaply, securely and faster on those services.

Mirage OS is implemented in the OCaml language, with 50+ libraries which map directly to operating system constructs when being compiled for production deployment. The goal is to make it as easy as possible to create Mirage OS appliances and ensure that all the things found in a typical operating system stack are still available to the developer. Mirage OS includes clean-slate functional implementations of protocols ranging from TCP/IP, DNS, SSH, OpenFlow (switch/controller), HTTP, XMPP and Xen Project inter-VM transports. Since everything is written in a single high-level language, it is easier to work with those libraries directly. This approach guarantees the best possible performance of Mirage OS on the Xen Hypervisor without needing to support the thousands of device drivers found in a traditional OS.

Bind 9 vs. Mirage OS throughput comparison

Performance comparison of Bind 9 vs. a DNS server written in Mirage OS.

An example of a Mirage OS appliance is a DNS server and below is a comparison with one of the most widely deployed DNS servers on the internet, BIND 9. As you can see, the Mirage OS appliance outperforms BIND 9 but in addition, the Mirage OS VM is less than 200kB in size compared to over 450MB for the BIND VM. Moreover, the traditional VM contains 4-5 times more lines of code than the Mirage implementation, and lines of code are often considered correlated with attack surface. More detail about this comparison and others can be found in the associated ASPLOS paper.

For the DNS appliance above, the application code was written using OCaml and compiled with the relevant Mirage OS libraries. To take full advantage of Mirage OS it is necessary to design and construct applications using OCaml, which provides a number of additional benefits such as type-safety. For those new to OCaml, there are some excellent resources to get started with the language, including a new book from O’Reilly and a range of tutorials on the revamped OCaml website.

We look forward to the exciting wave of innovation that Mirage OS will unleash including more resilient and lean software as well as increased developer productivity.

Xen on ARM and the Device Tree vs. ACPI debate

ACPI vs. Device Tree on ARM

Some of you may have seen the recent discussions on the linux-arm-kernel mailing list (and others) about the use of ACPI vs DT on the ARM platform. As always LWN have a pretty good summary (currently subscribers only, becomes freely available on 5 December) of the situation with ACPI on ARM.

Device Tree (or DT) and Advanced Configuration & Power Interface (or ACPI) are both standards which are used for describing a hardware platform e.g. to an operating system kernel. At their core both technologies provide a tree like data structure containing a hierarchy of devices and specifying what type they are and a set of “bindings” for that device. A binding is essentially a schema for specifying I/O regions, interrupt mappings, GPIOs and clocks etc.

For the last few years Linux on ARM has been moving away from hardcoded “board files” (a bunch of C code for each platform) towards using Device Tree instead. In the ARM space ACPI is the new kid on the block and has many unknowns. Given this the approach to ACPI which appears to have been reached by the Linux kernel maintainers, which is essentially to wait and see how the market pans out, seems sensible.

On the Xen side we started the port to ARM around the time that Linux’s transition from board files to Device Tree was starting and made the decision early on to go directly to device tree (ACPI wasn’t even on the table at this point, at least not publicly). Xen DT to discover all of the hardware on the system, both that which it intends to use itself and that which it intends to pass to domain 0. As well as consuming DT itself Xen also creates a filleted version of the host DT which it passes to the domain 0 kernel. DT is simple and yet powerful enough to allow us to do this relatively easily.

DT is also used by some of the BSD variants in their ARM ports as well.

My Position as Xen on ARM Maintainer

The platform configuration mechanism supported by Xen on ARM today is Device Tree. Device Tree is a good fit for our requirements and we will continue to support it as our primary hardware description mechanism.

Given that a number of operating system vendors and hardware vendors care about ACPI on ARM and are pushing hard for it, especially in the ARM server space, it is possible, perhaps even likely, that we will eventually find ourselves needing support ACPI as well. On systems which support both ACPI and DT we will continue to prefer Device Tree. Once ARM hardware platforms that only support ACPI are available, we will obviously need to support ACPI.

The Xen Project works closely with the Linux kernel and other open source upstreams as well as organisations such as Linaro. Before Xen on ARM can support ACPI I would like see it gaining some actual traction on ARM. In particular I would like to see it get to the point where it has been accepted by the Linux kernel maintainers. It is clearly not wise for Xen to be pioneering the use of ACPI before to it becoming clear whether or not it is going to gain any traction in the wider ecosystem.

So if you are an ARM silicon or platform vendor and you care about virtualization and Xen in particular, I encourage you to provide a complete device tree for your platform.

Note that this only applies to Xen on ARM. I cannot speak for Xen on x86 but I think it is pretty clear that it will continue to support ACPI so long as it remains the dominant hardware description on that platform.

It should also be noted that ACPI on ARM is primarily a server space thing at this stage. Of course Xen and Linux are not just about servers: both communities have sizable communities of embedded vendors (on the Xen side we had several interesting presentations at the recent Xen Developer Summit on embedded uses of Xen on ARM). Essentially no one is suggesting that the embedded use cases should move from DT to ACPI and so, irrespective of what happens with ACPI, DT has a strong future on ARM.

ACPI and Type I Hypervisors

Our experience on x86 has shown that the ACPI model is not a good fit for Type I hypervisors such as Xen, and the same is true on ARM. ACPI essentially enforces a model where the hypervisor, the kernel, the OSPM (the ACPI term for the bit of an OS which speaks ACPI) and the device drivers all must reside in the same privileged entity. In other words it effectively mandates a single monolithic entity which controls everything about the system. This obviously precludes such things as dividing hardware into that which is owned and controlled by the hypervisor and that which is owned and controlled by a virtual machine such as dom0. This impedance mismatch is probably not insurmountable but experience with ACPI on x86 Xen suggests that the resulting architecture is not going to be very agreeable.

UEFI

Due to their history on x86 ACPI and UEFI are often lumped together as a single thing when in reality they are mostly independent. There is no reason why UEFI cannot also be used with Device Tree. We would expect Xen to support UEFI sooner rather than later.