Category Archives: Commentary

A community member shares a viewpoint

A Brief Introduction to the Xen Project and Virtualization from Mohsen Mostafa Jokar

Mohsen Mostafa Jokar is a Linux administrator who works at the newspaper Hamshahri as a network and virtualization administrator. His interest in virtualization goes back to when he was at school and saw a Microsoft Virtual PC for the first time. He installed it on a PC with 256 MB of RAM and used it to virtualize Windows 98 and DOS.

Beyond Linux and virtualization, Mohsen is also familiar with Fedora Core, Knoppix, RedHat, bochs, Qemu, Xen, Citrix XenServer and VMWare ESXi.

He recently wrote an introductory book on Xen called “Hello Xen Project,” which is now available via this wiki. It provides a brief history of virtualization and the Xen Project, dom(s) and grub, using the Xen Project, and having fun with Xen. He hopes that more Xen users and experts will share their expertise and knowledge about this strong, stable and reliable virtualization platform through this wikipedia page.

A celebration of Xen Project 4.9 and the wiki book.

A celebration of Xen Project 4.9 and the wiki book.

In addition to his fascination with virtualization, Mohsen is a translator and an author. He has translated and written books for beginners and professional users that focus on virtualization, security and Linux. A few books that he has worked on as a technical reviewer include “Elixir in Action,” “Learn Git in a Month of Lunches,” “Mesos in Action” and “Reverse Engineering for Beginners.”

My GSoC Experience: Allow Setting up Shared Memory Regions between VMs from xl Config File

This blog was written by Zhongze Liu. Zhongze Liu is a student studying information security in Huazhong University of Science and Technology in Wuhan, China. He recently took part in GSoC 2017 where he worked closely with the Xen Project community on “Allowing Sharing Memory Regions between VMs from xl Config.” His interests are low-level hacking and system security (especially cloud and virtualization security).

I got to know the Xen Project about one year ago when I was working on a virtualization security project in a system security lab. It was the very first time that I received hands-on experience with a Type-I hypervisor. I was very interested in its internals and wanted to explore more of it by reading its code and performing some hacking on it. This is also what I was able to do this summer while I worked as a GSoC student with the Xen Project community. My specific focus was on setting up shared memory regions among VMs from a new xl config entry.

The purpose of this GSoC project is to allow simple guests that don’t have grant table support to be able to communicate via one or more shared memory regions. Such guests are not uncommon in the embedded world, and this project makes it possible for these poor guests to communicate with their friends.

This project involves many components of Xen, from the xl utility, xenstore, all the way down to the hypervisor itself. The implementation plan is quite straightforward: (1) during domain creation: parse the config –> map the pages –> write down in the xenstore fs what has been done;  (2) during domain destruction: read from xenstore the current status –> unmap the pages –> clean up the related xenstore entries. More details can be found in my proposal posted on the xen-devel mailing list. The tangible outcome is a patch set adding changes to xl, libxl, libxc, xsm and flask.

I met quite a few challenges during the project. The first and biggest one turned out to be how to design an appropriate syntax for the new config entry. The syntax has to be flexible and friendly to users. And the hardest part is how to control the stage-2 page permissions and cache attributes — we currently don’t have such a hypercall to control the stage-2 page attributes, but the clients are asking for the control over these attributes. I read a lot of documents about stage-2 page attributes on both x86 and ARM, and wrote a proposal for a new hypercall that would solve this issue.

After I made this proposal, I discovered that it would take up too much time to discuss the details in my proposal, not to mention implementing it. After discussing this challenge with my mentors, we decided to leave this as a TODO (see the “Future Directions” section in the project proposal), and only support the default attributes in the very first version of this project.

Next challenge: the “map the pages” step in the plan is easier said than done. After implemented the tool stack side, I moved forward to test my code, but kept getting errors on mapping the pages. By putting many printks through the whole code path, I found something blocking the way: On x86, adding foreign pages from one DomU to another by modifying p2m entries is not allowed.

Why? (1) the default xsm policy doesn’t allow this; (2) p2m tear-down is not implemented —doing so will screw up the refcount of the pages.

Fixing reason (2) is not a trivial task, but luckily, p2m tear-down is already implemented on the ARM side. So I decided to mark this new config entry as unsupported on x86, and continue to implement the ARM side. The fix to (1) turned out to be some changes to the xsm interface for xsm_map_gmfn_foreign, the dummy xsm policy, and the corresponding flask hook.

The last challenge that I’m going to talk about is testing. To test out ARM code, I followed this instruction on the Xen wiki to setup an emulator and another instruction on cross-compiling Xen and the tool stack for ARM. I’m not using Debian, so some of the handy tools provided by Debian are not available for me. I have to find alternative solutions to some of the critical steps and during my experiment, I found docker is the most distribution-independent solution which in the mean time won’t bring too much performance overhead. I created a Debian-based docker images with all the tools and dependencies required to build Xen, and every time I went to launch a build, I just needed to do a ‘docker run -v local/xen/path:docker/xen/path -it image-name build-script-name.sh‘.  I’m planning to post my Dockerfile to the Xen wiki so that others can build their own cross-building environment with a simple ‘docker build’.

I’ve really learned a lot during the process, from my mentors, and from the Xen Project community. Additionally:

  • I’ve improved my coding skills
  • I’ve learned more about the Xen Project and its internals
  • I’ve learned many efficient git tricks, which will be very useful in my future projects
  • I’ve read about memory management on ARM and x86
  • I’ve learned how to setup a rootfs, emulator, kernel, drivers and cross-compiling environment to test out ARM programs
  • And most importantly, I’ve learned how to work with an open source community.

And no words are strong enough to express my many thanks to all the people in the community who have helped me so far, especially Stefano, Julien, and Wei. They’ve been very supportive and responsive from the very beginning, giving me valuable suggestions and answering my sometimes stupid questions.

I’m very glad that I was invited to the Xen Project Summit in Budapest. It was really a great experience to meet so many interesting people there. And many thanks to Lars and Mary, who helped me in getting my visa to the event and offered me two T-shirts to help me through the hard times when my luggage was delayed.

The GSoC internship is coming to an end, and it’s just my first step to contributing to the Xen Project. I like this community and I am looking forward to contributing to it more and learning more from it.

 

My GSoC experience: Fuzzing the hypervisor

This blog post was written by Felix Schmoll, currently studying Mechanical Engineering at ETH Zurich. After obtaining a Bachelor in Computer Science from Jacobs University he spent the summer working on fuzzing the hypervisor as a Google Summer of Code student. His main interests in code are low-level endeavours and building scalable applications.

Five months ago, I had never even heard of fuzzing, but this summer, I worked on fuzzing the Xen Project hypervisor as a Google Summer of Code student.

For everybody that is not familiar with fuzzing: it is a way to test interfaces. The most primitive form of it is to repeatedly generate random input and feed it to the interface. A more advanced version is coverage-guided fuzzing, which uses information on the code path taken by the binary to permute the input further. The goal of this project was to build a prototype of fuzzing the hypercall-interface, seeing if one could make the hypervisor crash with a definite sequence of hypercalls.

American Fuzzy Lop (AFL) is by far the most popular fuzzer, and so it was chosen as the one to be run on the hypervisor. As it is a fuzzer for user-space programs, it had to be ported to the kernel. To make this work, the first step was to allow it to obtain feedback on the coverage from Xen by implementing a hypercall. Further, a mechanism was needed to execute the hypercalls from a domain other than dom0 (there are many ways to stop the hypervisor from dom0). For this purpose, an XTF-test case was instrumented to run as a server, receiving test cases from an AFL-instance. In the end, changes were made to the hypervisor, libxl, xenconsole, XTF and AFL.

The biggest challenge of all was finding my way around the code base of Xen. A lot of components were relevant to the project, and it would be unrealistic to expect anybody to read all of the code at once. While documentation was at times scarce, a helpful community of experts was always available on IRC. It was also a great experience to meet these people at the Xen Project Summit in Budapest.

The result of my summer project are numerous patches. While there were no bugs actually found (i.e. the hypervisor never crashed), valuable experience was collected for future projects. I am confident that by building up on the prototype it will be possible to improve the reliability of Xen. A first step would be to pass the addresses of valid buffers into hypercalls. For a description of more possible improvements please read the technical summary of the project.

Lastly, I would like to thank everybody involved with GSoC, Xen Project and in particular my great mentor Wei Liu for allowing me to experience how it is to work on a well-lead open-source project.

How To Shrink Attack Surfaces with a Hypervisor

A software environment’s attack surface is defined as the sum of points in which an unauthorized user or malicious adversary can enter or extract data. The smaller the attack surface, the better. Linux.com recently sat down with Doug Goldstein (https://github.com/cardoe or @doug_goldstein) to discuss how companies can use hypervisors to reduce attack surfaces and why the Xen Project hypervisor is a perfect choice for security-first environments. Doug is a principal software engineer at Star Lab, a company focused on providing software protection and integrity solutions for embedded systems.

You can read the full interview here.

What You Need to Know about Recent Xen Project Security Advisories

Today the Xen Project announced eight security advisories: XSA-191 to XSA-198. The bulk of these security advisories were discovered and fixed during the hardening phase of the Xen Project Hypervisor 4.8 release (expected to come out in early December). The Xen Project has implemented a security-first approach when publishing new releases.

In order to increase the security of future releases, members of the Xen Project Security Team and key contributors to the Xen Project, actively search and fix security bugs in code areas where vulnerability were found in past releases. The contributors use techniques such as code inspections, static code analysis, and additional testing using fuzzers such as American Fuzzy Lop. These fixes are then backported to older Xen Project releases with security support and published in bulk to make it easier for downstreams consumers to apply security fixes.

Before we declare a new Xen Project feature as supported, we perform a security assessment (see declare the Credit2 scheduler as supported). In addition, the contributors focused on security have started crafting tests for each vulnerability and integrated them into our automated regression testing system run regularly on all maintained versions of Xen. This ensures that the patch will be applied to every version which is vulnerable, and also ensures that no bug is accidentally reintroduced as development continues to go forward.

The Xen Project’s mature and robust security response process is optimized for cloud environments and downstream Xen Project consumers to maximize fairness, effectiveness and transparency. This includes not publicly discussing any details with security implications during our embargo period. This encourages anyone to report bugs they find to the Xen Project Security team, and allows the Xen Project Security team to assess, respond, and prepare patches, before before public disclosure and broad compromise occurs.

During the embargo period, the Xen Project does not publicly discuss any details with security implications except:

  • when co-opting technical assistance from other parties;
  • when issuing a Xen Project Security Advisory (XSA). This is pre-disclosed to only members on the Xen Project Pre-Disclosure List (see www.xenproject.org/security-policy.html); and
  • when necessary to coordinate with other projects affected

The Xen Project security team will assign and publicly release numbers for vulnerabilities. This is the only information that is shared publicly during the embargo period. See this url for “XSA Advisories, Publicly Released or Pre-Released”: xenbits.xen.org/xsa.

Xen’s latest XSA-191, XSA-192, XSA-193, XSA-194, XSA-195, XSA-196, XSA-197 and XSA-198 Advisory can all be found here:
xenbits.xen.org/xsa

Any Xen-based public cloud is eligible to be on our “pre-disclosure” list. Cloud providers on the list were notified of the vulnerability and provided a patch two weeks before the public announcement in order to make sure they all had time to apply the patch to their servers.

Distributions and other major software vendors of Xen Project software were also given the patch in advance to make sure they had updated packages ready to download as soon as the vulnerability was announced. Private clouds and individuals are urged to apply the patch or update their packages as soon as possible.

All of the above XSAs that affect the hypervisor can be deployed using the Xen Project LivePatching functionality, which enables re-boot free deployment of security patches to minimize disruption and downtime during security upgrades for system administrators and DevOps practitioners. The Xen Project encourages its users to download these patches.

More information about the Xen Project’s Security Vulnerability Process, including the embargo and disclosure schedule, policies around embargoed information, information sharing among pre-disclosed list members, a list of pre-disclosure list members, and the application process to join the list, can be found at: www.xenproject.org/security-policy.html

PV Calls: a new paravirtualized protocol for POSIX syscalls

Let’s take a step back and look at the current state of virtualization in the software industry. X86 hypervisors were built to run a few different operating systems on the same machine. Nowadays they are mostly used to execute several instances of the same OS (Linux), each running a single server application in isolation. Containers are a better fit for this use case, but they expose a very large attack surface. It is possible to reduce the attack surface, however it is a very difficult task, one that requires minute knowledge of the app running inside. At any scale it becomes a formidable challenge. The 15-year-old hypervisor technologies, principally designed for RHEL 5 and Windows XP, are more a workaround than a solution for this use case. We need to bring them to the present and take them into the future by modernizing their design.

The typical workload we need to support is a Linux server application which is packaged to be self contained, complying to the OCI Image Format or Docker Image Specification. The app comes with all required userspace dependencies, including its own libc. It makes syscalls to the Linux kernel to access resources and functionalities. This is the only interface we must support.

Many of these syscalls closely correspond to function calls which are part of the POSIX family of standards. They have well known parameters and return values. POSIX stands for “Portable Operating System Interface”: it defines an API available on all major Unixes today, including Linux. POSIX is large to begin with and Linux adds its own set of non-standard calls on top of it. As a result a Linux system has a very high number of exposed calls and, inescapably, also a high number of vulnerabilities. It is wise to restrict syscalls by default. Linux containers struggle with it, but hypervisors are very accomplished in this respect. After all hypervisors don’t need to have full POSIX compatibility. By paravirtualizing hardware interfaces, Xen provides powerful functionalities with a small attack surface. But PV devices are the wrong abstraction layer for Docker apps. They cause duplication of functionalities between the guest and the host. For example, the network stack is traversed twice, first in DomU then in Dom0. This is unnecessary. It is better to raise hypervisor abstractions by paravirtualizing a small set of syscalls directly.

PV Calls

It is far easier and more efficient to write paravirtualized drivers for syscalls than to emulate hardware because syscalls are at a higher level and made for software. I wrote a protocol specification called PV Calls to forward POSIX calls from DomU to Dom0. I also wrote a couple of prototype Linux drivers for it that work at the syscall level. The initial set of calls covers socket, connect, accept, listen, recvmsg, sendmsg and poll. The frontend driver forwards syscalls requests over a ring. The backend implements the syscalls, then returns success or failure to the caller. The protocol creates a new ring for each active socket. The ring size is configurable on a per socket basis. Receiving data is copied to the ring by the backend, while sending data is copied to the ring by the frontend. An event channel per ring is used to notify the other end of any activity. This tiny set of PV Calls is enough to provide networking capabilities to guests.

We are still running virtual machines, but mainly to restrict the vast majority of applications syscalls to a safe and isolated environment. The guest operating system kernel, which is provided by the infrastructure (it doesn’t come with the app), implements syscalls for the benefit of the server application. Xen gives us the means to exploit hardware virtualization extensions to create strong security boundaries around the application. Xen PV VMs enable this approach to work even when virtualization extensions are not available, such as on top of Amazon EC2 or Google Compute Engine instances.

This solution is as secure as Xen VMs but efficiently tailored for containers workloads. Early measurements show excellent performance. It also provides a couple of less obvious advantages. In Docker’s default networking model, containers’ communications appear to be made from the host IP address and containers’ listening ports are explicitly bound to the host. PV Calls are a perfect match for it: outgoing communications are made from the host IP address directly and listening ports are automatically bound to it. No additional configurations are required.

Another benefit is ease of monitoring. One of the key aspects of hardening Linux containers is keeping applications under constant observation with logging and monitoring. We should not ignore it even though Xen provides a safer environment by default. PV Calls forward networking calls made by the application to Dom0. In Dom0 we can trivially log them and detect misbehavior. More powerful (and expensive) monitoring techniques like memory introspection offer further opportunities for malware detection.

PV Calls are unobtrusive. No changes to Xen are required as the existing interfaces are enough. Changes to Linux are very limited as the drivers are self-contained. Moreover, PV Calls perform extremely well! Let’s take a look at a couple of iperf graphs (higher is better):

iperf client

iperf server

The first graph shows network bandwidth measured by running an iperf server in Dom0 and an iperf client inside the VM (or container in the case of Docker). PV Calls reach 75 gbit/sec with 4 threads, far better than netfront/netback.

The second graph shows network bandwidth measured by running an iperf server in the guest (or container in the case of Docker) and an iperf client in Dom0. In this scenario PV Calls reach 55 gbit/sec and outperform not just netfront/netback but even Docker.

The benchmarks have been run on an Intel Xeon D-1540 machine, with 8 cores (16 threads) and 32 GB of ram. Xen is 4.7.0-rc3 and Linux is 4.6-rc2. Dom0 and DomU have 4 vcpus each, pinned. DomU has 4 GB of ram.

For more information on PV Calls, read the full protocol specification on xen-devel. You are welcome to join us and participate in the review discussions. Contributions to the project are very appreciated!