Category Archives: Commentary

A community member shares a viewpoint

Google Summer of Code Project, TinyVMI: Porting LibVMI to Mini-OS

This blog post comes from Lele Ma, a Ph.D. student at William and Mary. He was recently a Google Summer of Code Intern working on the Honeynet Project. 

Introduction

This post introduces the project I worked on with Honeynet Project at Google Summer of Code this year. The project of TinyVMI is to port a library (LibVMI) into a tiny operating system (Mini-OS). After porting, LibVMI will have all its functionalities running inside a tiny virtual machine, which has a much smaller size as well as higher performance compared to the same library running on a Linux OS.

Mini-OS & Unikernels

Mini-OS is a tiny operating system demo distributed with the source of Xen Project Hypervisor (abbreviated as Xen below). It has been a basis for the development of several unikernels, such as ClickOS and Rump kernels. Unikernels can be viewed as a minimized operating system with following features:

  • No ring0/ring3, or kernel/user mode separation. Traditional operating systems, like Linux, separate programs into kernel mode and user mode to protect malicious users (applications) from accessing kernel memory. However, in unikernels like Mini-OS, there is only one mode, ring0, or kernel mode. This eliminates the burden of maintaining the context-switching between two modes. The code size of the kernel and runtime overhead are both reduced.
  • A minimal set of libraries. Instead of shipping with a lot of system/application libraries to provide a general purpose computing environment, a unikernel aims to be configured with a minimal set of libraries that are only necessary for the application that runs in it, thus also called a library operating system. For example, in Mini-OS, users can configure with libc to write applications in C language.

Fig.1 General Purpose OS vs. Mini-OS Unikernel

As shown in Fig.1, a unikernel is much smaller in size and eliminates all unnecessary tools and libraries, and even file systems from the OS, keeping only the application code and a tiny OS kernel. Unikernels can be more efficient than traditional operating systems, especially for cloud platforms where each specialized application is usually managed in a standalone VM. Unikernels are supposed to be the next generation of cloud platforms because they can achieve efficiency in several aspects. These include but are not limited to:

  1. Less memory footprint. A unikernel requires significantly less memory than a traditional operating system. For example, a Mini-OS VM with LibVMI application only requires 64MB of main memory. However, a Linux VM would occupy 4GB of main memory to get average performance for a 64-bit Linux. The reduced memory footprints would allow a single physical machine to host more VMs and reduce the average cost per service.
  2. Faster booting. Since the memory footprint is small and has no redundant libraries or kernel modules, a tiny OS would require significantly less time to boot than a traditional OS. Booting a tinyOS is just like starting the application itself.
  3. No kernel mode switching. OS kernels and applications are in the same chunk of the memory region. CPU context switches caused by system calls are eliminated in unikernels. Therefore, the runtime performance of the unikernel can be much better than a traditional OS.
  4. More secure. Each unikernel’s VM runs only one application. Isolation between applications is enforced by the hypervisor, instead of a shared OS kernel. Compared to process isolation or container isolation in Linux, the unikernel is more secure from the lower level isolation.
  5. Easy deployment; easy to use. Unikernel applications are built into a single binary to run directly as a VM image, which simplifies the deployment of the service. Unikernel applications are designed to be single click and run. All functionalities are customized at building time. Once deployed, the binary package requires no human modifications except the whole binary package being replaced.

In brief, Mini-OS is a tiny OS originated from the Xen Project hypervisor. Like other unikernels, Mini-OS provides higher performance and a more secure computing environment than a traditional operating system on the cloud.

Why port LibVMI to MiniOS

LibVMI is a secure critical library that could be used to view a target VM’s raw memory from another guest VM, thus gaining a whole view of almost all the activities on the target VM.

Traditionally, LibVMI runs in Dom0 on the hypervisor. However, Dom0 is already very big even without LibVMI in it. I got the idea of separating LibVMI from Dom0 from the following observations:

  1. Dom0 is a general purpose OS hosting many daily use applications, such as administrator tools. However, LibVMI is a special purpose library and usually not for daily use. Furthermore, there are almost no direct communications between LibVMI and other applications. Thus it is not necessary to install LibVMI in Dom0.
  2. Security risk. Dom0 is a critical domain for the hypervisor platform. Introducing a new code base to the kernel would also introduce new security risks. Other applications on Dom0 could leverage kernel vulnerabilities to compromise LibVMI, and vice versa, a bug in LibVMI could crash other applications or even the entire Dom0 kernel.
  3. Performance overhead. As introduced above, a general purpose OS is large and inefficient to run a special purpose application. CPU mode switching, large memory footprints, and process scheduling all introduce overheads for Dom0.

Therefore, we propose to port LibVMI to the tiny Mini-OS, named TinyVMI, to explore whether we can achieve the above benefits.

Challenges

First, the hypervisor isolates each guest VM from reading other VM’s memory pages. A guest VM should get enough permission before it can be used to introspect other VM’s memory. Second, LibVMI depends on several libraries that are not supported in the original Mini-OS. Therefore, in this project, we want solutions to overcome these two challenges.

Permissions in accessing other VM’s memory

To introspect a VM’s memory from another guest VM, the first thing is to get permissions from the hypervisor. By default, memory pages of each VM are strictly isolated from each other – they are not allowed to access the memory pages of other VMs. Although the hypervisor allows programmers to share memory pages between two VMs by grant tables, it requires the target VM to explicitly offer the page for sharing. Since the entire target VM is not trusted and no changes should be made to the target VM, LibVMI uses foreign memory mapping hypercalls to remap memory pages from the target VM to its own memory space. The permission of mapping a foreign page (target VM’s page) to its own address space for a guest VM (or Dom0) is controlled by the Xen Security Module (XSM), which will be introduced below.

Furthermore, Xen event channels allow a guest VM to monitor its memory status in real time with the help of hardware interruption. A ring buffer is shared between the hypervisor and the guest kernel to transfer event information. To access the ring buffer, XSM permission should also be granted.

Xen Security Module (XSM) uses FLASK policies as in SELinux, to enforce Mandatory Access Control(MAC) between different domains. Each permission (by default) is denied unless explicitly being allowed in the policy. Permissions are granted according to multiple categories the guest domain belongs to, such as the types, roles, users, and attributes of the guest domain (more).

The category of a VM is labeled in the configuration file we use to create it via xl create <config_file>. For example:

will label the VM as type domU_t1, under the role of system_r, and user of system_u, user system_u. Type is the lowest level of the category. Multiple types can be defined as one role multiple roles as one user.

Permissions are granted based on the types of a VM. For example, the permission of map_read allows a domain to map other domain’s memory with read-only permission. The policy:

will allow a VM with type domU_t1 to read the memory of another VM with type domU_t2.

In addition to the permissions granted from XSM, we also need the permission to read information from Xenstore, which is used to get metadata of the target VM, such as getting the Domain ID from the domain’s string name. Xenstore permission can be read via the command xenstore-ls -p:

The meaning of permission could be found from the manual. Command xenstore-chmod can be used to grant reading permissions to certain VMs. For example, to enable VM with ID 8 to read Xenstore directory /local/domain, you can run:

Build New Libraries into Mini-OS

The next challenge is building new libraries into Mini-OS. Mini-OS is an exemplary minimal operating system. To keep the kernel small, there are only a few libraries that can be built in it: newlib for C language library, a Xen-related library such as libxc to communicate with the hypervisor, and lwip for basic networking.

To port LibVMI to Mini-OS, 2 more libraries are needed. These include one JSON library to parse Rekall profiles, libjson-c, and one library with utility data structures, GLib.

In theory, most libraries written in C language can be built into Mini-OS with the help of newlib, such as libjson-c. This post introduces how to build new libraries. However, some of them might need to be manually customized for MiniOS by eliminating the unsupported portions, such as GLib.

Furthermore, security applications written in C++ programs can also be ported into Mini-OS. For example, DRAKVUF is a binary analysis system built on top of LibVMI and Xen. A portion of its code is in C++ language. To build these codes in Mini-OS, we need to cross-compile C++ standard libraries into the tiny kernel.

Project Status & Results

Functions added to Mini-OS

  • Support of LibVMI functions to introspect Linux and Windows guest on x86 architecture. Both memory access and event support are implemented. ARM architecture and other OS kernels (such as FreeBSD) have not been explored yet.
  • A customized GLib, a statically compiled libjson-c is cross-compiled into Mini-OS.
  • C++ language support. C++ standard library from GCC was cross-compiled into static libraries, such as libgcc, libstdc++, etc. Now in Mini-OS, we can program with C++ ! Not only C. Detailed steps can be found in this post.
  • A github site of Documentations and a Blog are maintained to document the manuals of how to build and run TinyVMI, as well as track the progress of each proceeded step during the summer.

Performance Analysis

In order to evaluate the TinyVMI system, we conduct a simple analysis and experiment to show its efficiency. We build two VM domains with LibVMI on the same hypervisor for comparison. One guest VM running Mini-OS with LibVMI and another VM, Dom0, running Linux (Ubuntu 16.04) with LibVMI. The target VM being introspected is a 64-bit Linux (Ubuntu 16.04). Results are shown in Fig.2 and Fig.3.

Fig.2 Code Size of LibVMI and Different Kernels

Fig.3 Time in Walking Through Page Table

Fig.2 shows the overall code size of the OS with LibVMI in it. LibVMI with MiniOS totaled 83K Lines of Code (LoC) while LibVMI with Linux kernel had 177K LoC, reducing more than 50% percent of code size. Note that the LoC of Linux kernel does not include any driver codes, which only reflects the possible minimal size of a Linux kernel. If drivers included, it could be 15M+ LoC for Linux system.

Fig.3 shows the time elapsed of reading one page by walking through the 4 levels of the page table while introspecting a 64-bit Linux guest VM. The time is an average of reading 500 consecutive pages. LibVMI in Mini-OS took 3.7 microseconds, while LibVMI in Linux took 5.7 microseconds, saving more than 30% of the time.

Conclusion

To briefly conclude the project, we have successfully ported the core functionalities of LibVMI into the tiny OS on Xen, Mini-OS. By customizing the XSM policy specifications and Xenstore permissions, a guest VM has been granted with permissions to introspect another guest VM via VMI technique. By customizing and cross compiling static libraries into Mini-OS, we have built LibVMI in a tiny OS, enabling a tiny VM to introspect both Linux and Windows guest VMs. Evaluations show the code size is reduced by more than 50% and performance is improved by more than 30% compared to VMI operations on Dom0 on the hypervisor.

Future Directions

  • DRAKVUF integration. After the last week of GSoC, C++ language support was added to TinyVMI under the help of this post from Notes to self. The next step would be cross-compiling the DRAKVUF system into TinyVMI. This will enable more applications to take full advantage of LibVMI interfaces already provided in the Mini-OS.
  • Dom0 Introspection. We all know Dom0 is huge. Although much work has been done to disaggregate it, it is still huge. TinyVMI itself has a small trusted computing base (TCB). However, we still need to trust Dom0 to enforce the XSM policies. This enlarges the TCB of the system significantly. Since we have to trust Dom0, it will be useless to monitor the main memory of Dom0 from TinyVMI. A further step to disaggregate Dom0 would be separate the XSM module management interface into another sub domain, or just to the same domain as TinyVMI. Taking this apart would make it possible to eliminate Dom0 from the trusted computing base, and allow TinyVMI to monitor Dom0 via VMI techniques.

Acknowledgment

Thanks to my mentors, Steven Maresca and Tamas K Lengyel, for accepting me as a student in GSoC this year. This is my first time at GSoC and this exciting project could not have been achieved without your prompt, helpful instructions and graceful patience. Thanks to Zibby Keaton for the grammar checkings on this post. Thanks to all Google Summer of Code committees for providing such a great opportunity for us to explore the world of open source!

 

The Xen Project is participating in 2018 Summer round of Outreachy

This is a quick reminder that the Xen Project is again participating in Outreachy (May 2018 to August 2018 Round). Please check the Outreachy application page for more information.

Outreach Program for Women has been helping women (cis and trans), trans men, and genderqueer people get involved in free and open source software worldwide. Note that the program has been extended and is now also open to people from other groups underrepresented in FOSS: specifically the program is open to residents and nationals of the United States of any gender who are Black/African American, Hispanic/Latin, American Indian, Alaska Native, Native Hawaiian, or Pacific Islander. Information on Eligibility and the application process can be found here.

A Brief Introduction to the Xen Project and Virtualization from Mohsen Mostafa Jokar

Mohsen Mostafa Jokar is a Linux administrator who works at the newspaper Hamshahri as a network and virtualization administrator. His interest in virtualization goes back to when he was at school and saw a Microsoft Virtual PC for the first time. He installed it on a PC with 256 MB of RAM and used it to virtualize Windows 98 and DOS.

Beyond Linux and virtualization, Mohsen is also familiar with Fedora Core, Knoppix, RedHat, bochs, Qemu, Xen, Citrix XenServer and VMWare ESXi.

He recently wrote an introductory book on Xen called “Hello Xen Project,” which is now available via this wiki. It provides a brief history of virtualization and the Xen Project, dom(s) and grub, using the Xen Project, and having fun with Xen. He hopes that more Xen users and experts will share their expertise and knowledge about this strong, stable and reliable virtualization platform through this wikipedia page.

A celebration of Xen Project 4.9 and the wiki book.

A celebration of Xen Project 4.9 and the wiki book.

In addition to his fascination with virtualization, Mohsen is a translator and an author. He has translated and written books for beginners and professional users that focus on virtualization, security and Linux. A few books that he has worked on as a technical reviewer include “Elixir in Action,” “Learn Git in a Month of Lunches,” “Mesos in Action” and “Reverse Engineering for Beginners.”

My GSoC Experience: Allow Setting up Shared Memory Regions between VMs from xl Config File

This blog was written by Zhongze Liu. Zhongze Liu is a student studying information security in Huazhong University of Science and Technology in Wuhan, China. He recently took part in GSoC 2017 where he worked closely with the Xen Project community on “Allowing Sharing Memory Regions between VMs from xl Config.” His interests are low-level hacking and system security (especially cloud and virtualization security).

I got to know the Xen Project about one year ago when I was working on a virtualization security project in a system security lab. It was the very first time that I received hands-on experience with a Type-I hypervisor. I was very interested in its internals and wanted to explore more of it by reading its code and performing some hacking on it. This is also what I was able to do this summer while I worked as a GSoC student with the Xen Project community. My specific focus was on setting up shared memory regions among VMs from a new xl config entry.

The purpose of this GSoC project is to allow simple guests that don’t have grant table support to be able to communicate via one or more shared memory regions. Such guests are not uncommon in the embedded world, and this project makes it possible for these poor guests to communicate with their friends.

This project involves many components of Xen, from the xl utility, xenstore, all the way down to the hypervisor itself. The implementation plan is quite straightforward: (1) during domain creation: parse the config –> map the pages –> write down in the xenstore fs what has been done;  (2) during domain destruction: read from xenstore the current status –> unmap the pages –> clean up the related xenstore entries. More details can be found in my proposal posted on the xen-devel mailing list. The tangible outcome is a patch set adding changes to xl, libxl, libxc, xsm and flask.

I met quite a few challenges during the project. The first and biggest one turned out to be how to design an appropriate syntax for the new config entry. The syntax has to be flexible and friendly to users. And the hardest part is how to control the stage-2 page permissions and cache attributes — we currently don’t have such a hypercall to control the stage-2 page attributes, but the clients are asking for the control over these attributes. I read a lot of documents about stage-2 page attributes on both x86 and ARM, and wrote a proposal for a new hypercall that would solve this issue.

After I made this proposal, I discovered that it would take up too much time to discuss the details in my proposal, not to mention implementing it. After discussing this challenge with my mentors, we decided to leave this as a TODO (see the “Future Directions” section in the project proposal), and only support the default attributes in the very first version of this project.

Next challenge: the “map the pages” step in the plan is easier said than done. After implemented the tool stack side, I moved forward to test my code, but kept getting errors on mapping the pages. By putting many printks through the whole code path, I found something blocking the way: On x86, adding foreign pages from one DomU to another by modifying p2m entries is not allowed.

Why? (1) the default xsm policy doesn’t allow this; (2) p2m tear-down is not implemented —doing so will screw up the refcount of the pages.

Fixing reason (2) is not a trivial task, but luckily, p2m tear-down is already implemented on the ARM side. So I decided to mark this new config entry as unsupported on x86, and continue to implement the ARM side. The fix to (1) turned out to be some changes to the xsm interface for xsm_map_gmfn_foreign, the dummy xsm policy, and the corresponding flask hook.

The last challenge that I’m going to talk about is testing. To test out ARM code, I followed this instruction on the Xen wiki to setup an emulator and another instruction on cross-compiling Xen and the tool stack for ARM. I’m not using Debian, so some of the handy tools provided by Debian are not available for me. I have to find alternative solutions to some of the critical steps and during my experiment, I found docker is the most distribution-independent solution which in the mean time won’t bring too much performance overhead. I created a Debian-based docker images with all the tools and dependencies required to build Xen, and every time I went to launch a build, I just needed to do a ‘docker run -v local/xen/path:docker/xen/path -it image-name build-script-name.sh‘.  I’m planning to post my Dockerfile to the Xen wiki so that others can build their own cross-building environment with a simple ‘docker build’.

I’ve really learned a lot during the process, from my mentors, and from the Xen Project community. Additionally:

  • I’ve improved my coding skills
  • I’ve learned more about the Xen Project and its internals
  • I’ve learned many efficient git tricks, which will be very useful in my future projects
  • I’ve read about memory management on ARM and x86
  • I’ve learned how to setup a rootfs, emulator, kernel, drivers and cross-compiling environment to test out ARM programs
  • And most importantly, I’ve learned how to work with an open source community.

And no words are strong enough to express my many thanks to all the people in the community who have helped me so far, especially Stefano, Julien, and Wei. They’ve been very supportive and responsive from the very beginning, giving me valuable suggestions and answering my sometimes stupid questions.

I’m very glad that I was invited to the Xen Project Summit in Budapest. It was really a great experience to meet so many interesting people there. And many thanks to Lars and Mary, who helped me in getting my visa to the event and offered me two T-shirts to help me through the hard times when my luggage was delayed.

The GSoC internship is coming to an end, and it’s just my first step to contributing to the Xen Project. I like this community and I am looking forward to contributing to it more and learning more from it.

 

My GSoC experience: Fuzzing the hypervisor

This blog post was written by Felix Schmoll, currently studying Mechanical Engineering at ETH Zurich. After obtaining a Bachelor in Computer Science from Jacobs University he spent the summer working on fuzzing the hypervisor as a Google Summer of Code student. His main interests in code are low-level endeavours and building scalable applications.

Five months ago, I had never even heard of fuzzing, but this summer, I worked on fuzzing the Xen Project hypervisor as a Google Summer of Code student.

For everybody that is not familiar with fuzzing: it is a way to test interfaces. The most primitive form of it is to repeatedly generate random input and feed it to the interface. A more advanced version is coverage-guided fuzzing, which uses information on the code path taken by the binary to permute the input further. The goal of this project was to build a prototype of fuzzing the hypercall-interface, seeing if one could make the hypervisor crash with a definite sequence of hypercalls.

American Fuzzy Lop (AFL) is by far the most popular fuzzer, and so it was chosen as the one to be run on the hypervisor. As it is a fuzzer for user-space programs, it had to be ported to the kernel. To make this work, the first step was to allow it to obtain feedback on the coverage from Xen by implementing a hypercall. Further, a mechanism was needed to execute the hypercalls from a domain other than dom0 (there are many ways to stop the hypervisor from dom0). For this purpose, an XTF-test case was instrumented to run as a server, receiving test cases from an AFL-instance. In the end, changes were made to the hypervisor, libxl, xenconsole, XTF and AFL.

The biggest challenge of all was finding my way around the code base of Xen. A lot of components were relevant to the project, and it would be unrealistic to expect anybody to read all of the code at once. While documentation was at times scarce, a helpful community of experts was always available on IRC. It was also a great experience to meet these people at the Xen Project Summit in Budapest.

The result of my summer project are numerous patches. While there were no bugs actually found (i.e. the hypervisor never crashed), valuable experience was collected for future projects. I am confident that by building up on the prototype it will be possible to improve the reliability of Xen. A first step would be to pass the addresses of valid buffers into hypercalls. For a description of more possible improvements please read the technical summary of the project.

Lastly, I would like to thank everybody involved with GSoC, Xen Project and in particular my great mentor Wei Liu for allowing me to experience how it is to work on a well-lead open-source project.

How To Shrink Attack Surfaces with a Hypervisor

A software environment’s attack surface is defined as the sum of points in which an unauthorized user or malicious adversary can enter or extract data. The smaller the attack surface, the better. Linux.com recently sat down with Doug Goldstein (https://github.com/cardoe or @doug_goldstein) to discuss how companies can use hypervisors to reduce attack surfaces and why the Xen Project hypervisor is a perfect choice for security-first environments. Doug is a principal software engineer at Star Lab, a company focused on providing software protection and integrity solutions for embedded systems.

You can read the full interview here.