Tag Archives: Linux

PV Calls: a new paravirtualized protocol for POSIX syscalls

Let’s take a step back and look at the current state of virtualization in the software industry. X86 hypervisors were built to run a few different operating systems on the same machine. Nowadays they are mostly used to execute several instances of the same OS (Linux), each running a single server application in isolation. Containers are a better fit for this use case, but they expose a very large attack surface. It is possible to reduce the attack surface, however it is a very difficult task, one that requires minute knowledge of the app running inside. At any scale it becomes a formidable challenge. The 15-year-old hypervisor technologies, principally designed for RHEL 5 and Windows XP, are more a workaround than a solution for this use case. We need to bring them to the present and take them into the future by modernizing their design.

The typical workload we need to support is a Linux server application which is packaged to be self contained, complying to the OCI Image Format or Docker Image Specification. The app comes with all required userspace dependencies, including its own libc. It makes syscalls to the Linux kernel to access resources and functionalities. This is the only interface we must support.

Many of these syscalls closely correspond to function calls which are part of the POSIX family of standards. They have well known parameters and return values. POSIX stands for “Portable Operating System Interface”: it defines an API available on all major Unixes today, including Linux. POSIX is large to begin with and Linux adds its own set of non-standard calls on top of it. As a result a Linux system has a very high number of exposed calls and, inescapably, also a high number of vulnerabilities. It is wise to restrict syscalls by default. Linux containers struggle with it, but hypervisors are very accomplished in this respect. After all hypervisors don’t need to have full POSIX compatibility. By paravirtualizing hardware interfaces, Xen provides powerful functionalities with a small attack surface. But PV devices are the wrong abstraction layer for Docker apps. They cause duplication of functionalities between the guest and the host. For example, the network stack is traversed twice, first in DomU then in Dom0. This is unnecessary. It is better to raise hypervisor abstractions by paravirtualizing a small set of syscalls directly.

PV Calls

It is far easier and more efficient to write paravirtualized drivers for syscalls than to emulate hardware because syscalls are at a higher level and made for software. I wrote a protocol specification called PV Calls to forward POSIX calls from DomU to Dom0. I also wrote a couple of prototype Linux drivers for it that work at the syscall level. The initial set of calls covers socket, connect, accept, listen, recvmsg, sendmsg and poll. The frontend driver forwards syscalls requests over a ring. The backend implements the syscalls, then returns success or failure to the caller. The protocol creates a new ring for each active socket. The ring size is configurable on a per socket basis. Receiving data is copied to the ring by the backend, while sending data is copied to the ring by the frontend. An event channel per ring is used to notify the other end of any activity. This tiny set of PV Calls is enough to provide networking capabilities to guests.

We are still running virtual machines, but mainly to restrict the vast majority of applications syscalls to a safe and isolated environment. The guest operating system kernel, which is provided by the infrastructure (it doesn’t come with the app), implements syscalls for the benefit of the server application. Xen gives us the means to exploit hardware virtualization extensions to create strong security boundaries around the application. Xen PV VMs enable this approach to work even when virtualization extensions are not available, such as on top of Amazon EC2 or Google Compute Engine instances.

This solution is as secure as Xen VMs but efficiently tailored for containers workloads. Early measurements show excellent performance. It also provides a couple of less obvious advantages. In Docker’s default networking model, containers’ communications appear to be made from the host IP address and containers’ listening ports are explicitly bound to the host. PV Calls are a perfect match for it: outgoing communications are made from the host IP address directly and listening ports are automatically bound to it. No additional configurations are required.

Another benefit is ease of monitoring. One of the key aspects of hardening Linux containers is keeping applications under constant observation with logging and monitoring. We should not ignore it even though Xen provides a safer environment by default. PV Calls forward networking calls made by the application to Dom0. In Dom0 we can trivially log them and detect misbehavior. More powerful (and expensive) monitoring techniques like memory introspection offer further opportunities for malware detection.

PV Calls are unobtrusive. No changes to Xen are required as the existing interfaces are enough. Changes to Linux are very limited as the drivers are self-contained. Moreover, PV Calls perform extremely well! Let’s take a look at a couple of iperf graphs (higher is better):

iperf client

iperf server

The first graph shows network bandwidth measured by running an iperf server in Dom0 and an iperf client inside the VM (or container in the case of Docker). PV Calls reach 75 gbit/sec with 4 threads, far better than netfront/netback.

The second graph shows network bandwidth measured by running an iperf server in the guest (or container in the case of Docker) and an iperf client in Dom0. In this scenario PV Calls reach 55 gbit/sec and outperform not just netfront/netback but even Docker.

The benchmarks have been run on an Intel Xeon D-1540 machine, with 8 cores (16 threads) and 32 GB of ram. Xen is 4.7.0-rc3 and Linux is 4.6-rc2. Dom0 and DomU have 4 vcpus each, pinned. DomU has 4 GB of ram.

For more information on PV Calls, read the full protocol specification on xen-devel. You are welcome to join us and participate in the review discussions. Contributions to the project are very appreciated!

Xen & Docker: Made for Each Other!

By Olivier Lambert

Containers and hypervisors are often seen as competing technologies – enemies even. But in reality the two technologies are complementary and increasingly used together by developers and admins. This recent Linux.com article talked about this supposed battle, noting however that developers are using Docker in traditional VMs to bolster security. Containers allow users to develop and deploy a variety of applications with incredible efficiency, while virtualization eliminates any constraints and/or exposure to outside attacks.

Uniting these technologies helps developers and system administrators be even more efficient. Let’s take a closer look at how to achieve this with Docker and Xen Project virtualization, and why we expect more and more organizations to use them together in the near future. This will also be a key topic at the September 15 Xen Project User Summit at the Lighthouse Executive Conference Center in New York City. Register today to learn more about enabling Docker in Xen environments for a truly open infrastructure.

xen-docker

Caption: Xen In Action: lifting Docker, which is lifting containers. I heard you like boats, so I put boats on your boat :).

Who’s Who: What is Xen Project Virtualization?

Xen Project Hypervisor is mature virtualization technology used by many of the world’s largest cloud providers like AWS, Verizon Terremark, Rackspace and many more. Founded in 2003, Xen Project virtualization is proven as a highly reliable, efficient and flexible hypervisor for a range of environments, running perfectly from x86 to ARM.

It’s now completely integrated in the Linux upstream and is hosted by the Linux Foundation. The same big cloud users mentioned above also contribute regularly to the project along with many of the world’s largest technology companies, including Citrix, Cavium, Intel, Oracle and more.

Feature updates and broader community collaboration are on the upswing too: more commits, more communication, better integration, new use cases and simpler and more powerful modes, such as PVHVM then PVH, as outlined in this recent blog.

The core Xen Project team takes security seriously. The technology has also been battle-tested by many in the defense industry including the NSA. Xen Project users have benefited from this for years, and developers building, shipping and running distributed applications will profit as well.

XenLogoBlackGreen

What is XenServer and Xen Orchestra?

XenServer is a packaged product consisting of the Xen Project Hypervisor and the Xen Project Management API (XAPI) toolstack within a performance tuned CentOS distribution. It’s free and can be installed in just a few minutes; click here to download it: http://xenserver.org/open-source-virtualization-download.html.

Xen Orchestra (XO) is a simple but powerful web interface working out-of-the-box with XenServer, or any host with Xen and XAPI (the most advanced API for Xen). Take a look on the project website to learn more. Both of these tool are of course free software.

What is Docker?

In its own words, Docker defines itself as an open platform for developers and sysadmins to build, ship, and run distributed applications. Consisting of Docker Engine, a portable, lightweight runtime and packaging tool, and Docker Hub, a cloud service for sharing applications and automating workflows, Docker enables apps to be quickly assembled from components and eliminates the friction between development, QA, and production environments.

docker-logo-370x290

Main Advantages:

  • fast (boot a container in milliseconds)
  • simple to use, even in complex workflows
  • light (same kernel)
  • container density on one host

The other side of the coin:

  • all containers rely on the same kernel (isolation, security)
  • less maturity than traditional hypervisor (Docker is still young)
  • containers are using the same OS on the host (less diversity than hypervisors)
  • some friction between developers and admins about its usage: not Docker’s fault, more a classical friction when you bring new toys to your devs. :) We’ll see why and how to cope with just that below.

Best of Both Worlds

An ideal world would:

  • Let admins do their admin stuff without constraints and/or exposure to dangerous things.
  • Let developers do their developer stuff without constraints and/or exposure to dangerous things.

Fluid Workflow

In other words, they’d be able to create really cool workflows. For example:

  • An admin should be able to easily create a Docker ready VM running in a hypervisor, with the exact amount of resources needed at a given point in time (he knows the total amount of resources, e.g a VM with 2 CPUs and 4GB of RAM.
  • He should delegate (with the same simplicity) this Docker-ready VM to the dev team.
  • Developers can use it and play with their new toy, without any chance of breaking stuff other than the VM itself. The VM is actually a sandbox, not a jail; developers can create their containers as they need in this scenario.

Now you can easily imagine other exciting things such as:

  • An admin can delegate rollback snapshot control to a developer. If he breaks the VM, he can rollback to the “clean” snapshot — without bothering the admin staff. Live, die, repeat!
  • Need to clone the same container for other tests? One click in a web interface.
  • Need to extend the resources of this current VM? One click, in live.
  • Ideally, let a developer create its container from the same web interface.

Xen Orchestra: A Bridge Between Docker and Xen Project Hypervisor 

So how do we do all this without creating a brand new tool? As you may guess, the answer is Xen Orchestra, which today achieves much of this. Updates planned for later this year and 2015 will deliver even more efficiencies between the two technologies.

What XO Does Today

  • Adjust Resources In Live: You can reduce/raise number of CPUs, RAM, etc., while the VM is running! Doing this, you can grow or reduce the footprint of your Docker VM, without interrupting the service. Check it out in this short video.
  • Snapshots and Rollback: Snapshots and rollback in XO are totally operational since XO 3.3. Check out how this works in this feature presentation. Coupled with Docker, this is very helpful. When your fresh Dockerized VM is ready, take a snapshot. Then you can rollback when you want to retrieve this clean state. All with just a few clicks and in a few seconds.

Coming Soon

  • Docker-Ready Templates in One Click: This feature will be released this year. In a few words, you can request our template directly from your XO interface, it will be downloaded and operational in your own infrastructure with a Docker listening and ready for action,Iin the resources you choose to allocate (CPU, RAM, Disk). No installation: It works out of the box. Read more in this article.
  • ACL and Delegation: The perfect workflow rest upon integration of ACLs in Xen Orchestra is our current priority. In our case, it allows VM delegation for your team using Docker; the VM can be rollbacked or rebooted without asking you. More info. here.
  • Docker Control from XO: Because we can get the IP of a VM thanks to its Xen tools, we should be able to send command to the Docker API directly through XO. In this way, you’ll just have to use one interface for Docker AND Xen (at least, for simple Docker operations). And take the best of XO for both: ACLs, visualization etc. This last feature is not in our current roadmap, but will probably pop up early in 2015!

We-need-to-go-deeper_inception

Caption: Coming soon — deeper integration between Docker and Xen.

Conclusion

Docker is a really promising and growing technology. With Docker and Xen on the same team, the two technologies work in tandem to create an extremely efficient, best-of-breed infrastructure. Finally uniting them in one interface is a big leap ahead!

Any questions or comments? Go ahead!

By Olivier Lambert, Creator of Xen Orchestra Project

 

Xen Orchestra: a Web interface for XCP

Maybe you heard few years ago, a project called Xen Orchesta. It was designed to provide a web interface for Xen hypervisor with Xend backend. The project started in 2009, but paused one year later, due to lack of time from the original designer. As you can read on the project website, XO is now re-developed from scratch. But its goal now is to provide an interface for XCP. Why XCP? Because, with XAPI, it offers a full set of features with unmatched possibilities for a Open Source product.

Despite this, XCP lacks of a free, simple and open source interface. That’s why the project reboot. Other interesting projects are now dead (like OpenXenManager, a clone of XenCenter). To avoid this kind of scenario, a clear intention for XO team is to provide a living project: “release often” policy, listening to the community, and deliver commercial support to getting resources needed for the project life. The original team behind XO have created their own company to sustain this durability to XO. Furthermore, XO license is AGPL.

Current state

XO was just rebooted in December. But we want a first version rapidly, at least for testing purpose (architecture validation, feedback, suggestions). That’s why this release is quite light: we focused on global design, picking right technologies for the future. Because we think it’s better to analyse and implement a strong basis rather than doing ugly stuff which can jeopardize the project later. So, we are proud to present you XO “Archlute” which is the first step in web management for XCP.

Continue reading

Announcing Project Zeus: XenAPI in Fedora, CentOS and the EPEL

The XCP team would like to announce Project Zeus, our port of the XCP toolstack to Fedora and CentOS (through the EPEL). This is a follow-on to Project Kronos, which brought the XCP toolstack to Debian-based systems. This will give users the ability to do ‘yum install xcp-xapi’ to build a system that is functionally equivalent to the normal XCP. Our target for this project is Fedora 17, which will be released in May.

We don’t have any code to share yet, but packaging is currently underway. We will be able to reuse most of the work that we did in Project Kronos to port xapi to Debian, so Zeus should take significantly less effort to accomplish. I’d like to thank Pasi Kärkkäinen, M A Young, David Nalley and Eric Christensen for volunteering to lead the packaging effort. Here are some useful links:

XCP 1.5 Beta now available

I’m happy to announce that XCP 1.5 Beta is available! This release comes with a number of new features, most notably the Xen 4.1 hypervisor. Please go to the downloads page to download and test the beta release. If you would like to report a possible bug, please email the xen-api mailing list with the subject “XCP 1.5 BETA BUG: <subject>” (and see this for a list of bug reporting guidelines for the Xen community).

  • Host Architectural Improvements. XCP 1.5 now runs on the Xen 4.1 hypervisor, provides GPT support and a smaller, more scalable Dom0.
  • GPU Pass-Through. Enables a physical GPU to be assigned to a VM providing high-end graphics.
  • Increased Performance and Scale. Supported limits have been increased to 1 TB memory for XCP hosts, and up to 16 virtual processors and 128 GB virtual memory for VMs. Improved XCP Tools with smaller footprint.
  • Networking Improvements. Open vSwitch is now the default networking stack in XCP 1.5 and now provides formal support for Active-Backup NIC bonding.
  • SR-IOV Improvements. Improved scalability and certification with the SR-IOV Test Kit. Experimental SR-IOV with XenMotion support with Solarflare SR-IOV adapters.
  • Integrated Site Recovery (Disaster Recovery). Remote data replication between storage arrays with fast recovery and failback capabilities. Integrated Site Recovery works with any iSCSI or Hardware HBA storage repository.
  • Virtual Appliance Support (vApp). Ability to create multi-VM and boot sequenced virtual appliances (vApps) that integrate with Integrated Site Recovery and High Availability. vApps can be easily imported and exported using the Open Virtualization Format (OVF) standard.
  • VM Import & Export Improvements. Full support for VM disk and OVF appliance imports directly from XenCenter with the ability to change VM parameters (virtual processor, virtual memory, virtual interfaces, and target storage repository) with the Import wizard. Full OVF import support for XCP, XenConvert and VMware.
  • Enhanced Guest OS Support. Support for Ubuntu 10.04 (32/64-bit). Updated support for Debian Squeeze 6.0 64-bit, Oracle Enterprise Linux 6.0 (32/64-bit) and SLES 10 SP4 (32/64-bit). Experimental VM templates for CentOS 6.0 (32/64-bit), Ubuntu 10.10 (32/64-bit) and Solaris 10.

Note that XCP 1.5 is the open source edition of Citrix XenServer 6.0.

Xen.org 2011 Year in Review

It truly was an amazing year for Xen.org! The key highlights included Dom0 supporting going into mainline Linux kernel, Project Kronos, and renewed focus  Xen for ARM. All three of these projects are examples of standing on the shoulders of giants.

In 2011, Xen.org welcomed Lars Kurth as our community manager. Lars’ impact can be seen most notably in a formalized governance model, a new Xen wiki, and numerous events that Xen.org held and attended (described below).

Technology

Xen.org Events

Xen.org at External Community Events

Ian Pratt gave his thoughts on Xen past, present and future at Xen Summit Asia 2011. His slides and a video are available: (slides: http://www.slideshare.net/xen_com_mgr/xenorg-the-past-the-present-and-exciting-future video: http://vimeo.com/33056576)