Category Archives: Technical

Google Summer of Code Project, TinyVMI: Porting LibVMI to Mini-OS

This blog post comes from Lele Ma, a Ph.D. student at William and Mary. He was recently a Google Summer of Code Intern working on the Honeynet Project. 

Introduction

This post introduces the project I worked on with Honeynet Project at Google Summer of Code this year. The project of TinyVMI is to port a library (LibVMI) into a tiny operating system (Mini-OS). After porting, LibVMI will have all its functionalities running inside a tiny virtual machine, which has a much smaller size as well as higher performance compared to the same library running on a Linux OS.

Mini-OS & Unikernels

Mini-OS is a tiny operating system demo distributed with the source of Xen Project Hypervisor (abbreviated as Xen below). It has been a basis for the development of several unikernels, such as ClickOS and Rump kernels. Unikernels can be viewed as a minimized operating system with following features:

  • No ring0/ring3, or kernel/user mode separation. Traditional operating systems, like Linux, separate programs into kernel mode and user mode to protect malicious users (applications) from accessing kernel memory. However, in unikernels like Mini-OS, there is only one mode, ring0, or kernel mode. This eliminates the burden of maintaining the context-switching between two modes. The code size of the kernel and runtime overhead are both reduced.
  • A minimal set of libraries. Instead of shipping with a lot of system/application libraries to provide a general purpose computing environment, a unikernel aims to be configured with a minimal set of libraries that are only necessary for the application that runs in it, thus also called a library operating system. For example, in Mini-OS, users can configure with libc to write applications in C language.

Fig.1 General Purpose OS vs. Mini-OS Unikernel

As shown in Fig.1, a unikernel is much smaller in size and eliminates all unnecessary tools and libraries, and even file systems from the OS, keeping only the application code and a tiny OS kernel. Unikernels can be more efficient than traditional operating systems, especially for cloud platforms where each specialized application is usually managed in a standalone VM. Unikernels are supposed to be the next generation of cloud platforms because they can achieve efficiency in several aspects. These include but are not limited to:

  1. Less memory footprint. A unikernel requires significantly less memory than a traditional operating system. For example, a Mini-OS VM with LibVMI application only requires 64MB of main memory. However, a Linux VM would occupy 4GB of main memory to get average performance for a 64-bit Linux. The reduced memory footprints would allow a single physical machine to host more VMs and reduce the average cost per service.
  2. Faster booting. Since the memory footprint is small and has no redundant libraries or kernel modules, a tiny OS would require significantly less time to boot than a traditional OS. Booting a tinyOS is just like starting the application itself.
  3. No kernel mode switching. OS kernels and applications are in the same chunk of the memory region. CPU context switches caused by system calls are eliminated in unikernels. Therefore, the runtime performance of the unikernel can be much better than a traditional OS.
  4. More secure. Each unikernel’s VM runs only one application. Isolation between applications is enforced by the hypervisor, instead of a shared OS kernel. Compared to process isolation or container isolation in Linux, the unikernel is more secure from the lower level isolation.
  5. Easy deployment; easy to use. Unikernel applications are built into a single binary to run directly as a VM image, which simplifies the deployment of the service. Unikernel applications are designed to be single click and run. All functionalities are customized at building time. Once deployed, the binary package requires no human modifications except the whole binary package being replaced.

In brief, Mini-OS is a tiny OS originated from the Xen Project hypervisor. Like other unikernels, Mini-OS provides higher performance and a more secure computing environment than a traditional operating system on the cloud.

Why port LibVMI to MiniOS

LibVMI is a secure critical library that could be used to view a target VM’s raw memory from another guest VM, thus gaining a whole view of almost all the activities on the target VM.

Traditionally, LibVMI runs in Dom0 on the hypervisor. However, Dom0 is already very big even without LibVMI in it. I got the idea of separating LibVMI from Dom0 from the following observations:

  1. Dom0 is a general purpose OS hosting many daily use applications, such as administrator tools. However, LibVMI is a special purpose library and usually not for daily use. Furthermore, there are almost no direct communications between LibVMI and other applications. Thus it is not necessary to install LibVMI in Dom0.
  2. Security risk. Dom0 is a critical domain for the hypervisor platform. Introducing a new code base to the kernel would also introduce new security risks. Other applications on Dom0 could leverage kernel vulnerabilities to compromise LibVMI, and vice versa, a bug in LibVMI could crash other applications or even the entire Dom0 kernel.
  3. Performance overhead. As introduced above, a general purpose OS is large and inefficient to run a special purpose application. CPU mode switching, large memory footprints, and process scheduling all introduce overheads for Dom0.

Therefore, we propose to port LibVMI to the tiny Mini-OS, named TinyVMI, to explore whether we can achieve the above benefits.

Challenges

First, the hypervisor isolates each guest VM from reading other VM’s memory pages. A guest VM should get enough permission before it can be used to introspect other VM’s memory. Second, LibVMI depends on several libraries that are not supported in the original Mini-OS. Therefore, in this project, we want solutions to overcome these two challenges.

Permissions in accessing other VM’s memory

To introspect a VM’s memory from another guest VM, the first thing is to get permissions from the hypervisor. By default, memory pages of each VM are strictly isolated from each other – they are not allowed to access the memory pages of other VMs. Although the hypervisor allows programmers to share memory pages between two VMs by grant tables, it requires the target VM to explicitly offer the page for sharing. Since the entire target VM is not trusted and no changes should be made to the target VM, LibVMI uses foreign memory mapping hypercalls to remap memory pages from the target VM to its own memory space. The permission of mapping a foreign page (target VM’s page) to its own address space for a guest VM (or Dom0) is controlled by the Xen Security Module (XSM), which will be introduced below.

Furthermore, Xen event channels allow a guest VM to monitor its memory status in real time with the help of hardware interruption. A ring buffer is shared between the hypervisor and the guest kernel to transfer event information. To access the ring buffer, XSM permission should also be granted.

Xen Security Module (XSM) uses FLASK policies as in SELinux, to enforce Mandatory Access Control(MAC) between different domains. Each permission (by default) is denied unless explicitly being allowed in the policy. Permissions are granted according to multiple categories the guest domain belongs to, such as the types, roles, users, and attributes of the guest domain (more).

The category of a VM is labeled in the configuration file we use to create it via xl create <config_file>. For example:

will label the VM as type domU_t1, under the role of system_r, and user of system_u, user system_u. Type is the lowest level of the category. Multiple types can be defined as one role multiple roles as one user.

Permissions are granted based on the types of a VM. For example, the permission of map_read allows a domain to map other domain’s memory with read-only permission. The policy:

will allow a VM with type domU_t1 to read the memory of another VM with type domU_t2.

In addition to the permissions granted from XSM, we also need the permission to read information from Xenstore, which is used to get metadata of the target VM, such as getting the Domain ID from the domain’s string name. Xenstore permission can be read via the command xenstore-ls -p:

The meaning of permission could be found from the manual. Command xenstore-chmod can be used to grant reading permissions to certain VMs. For example, to enable VM with ID 8 to read Xenstore directory /local/domain, you can run:

Build New Libraries into Mini-OS

The next challenge is building new libraries into Mini-OS. Mini-OS is an exemplary minimal operating system. To keep the kernel small, there are only a few libraries that can be built in it: newlib for C language library, a Xen-related library such as libxc to communicate with the hypervisor, and lwip for basic networking.

To port LibVMI to Mini-OS, 2 more libraries are needed. These include one JSON library to parse Rekall profiles, libjson-c, and one library with utility data structures, GLib.

In theory, most libraries written in C language can be built into Mini-OS with the help of newlib, such as libjson-c. This post introduces how to build new libraries. However, some of them might need to be manually customized for MiniOS by eliminating the unsupported portions, such as GLib.

Furthermore, security applications written in C++ programs can also be ported into Mini-OS. For example, DRAKVUF is a binary analysis system built on top of LibVMI and Xen. A portion of its code is in C++ language. To build these codes in Mini-OS, we need to cross-compile C++ standard libraries into the tiny kernel.

Project Status & Results

Functions added to Mini-OS

  • Support of LibVMI functions to introspect Linux and Windows guest on x86 architecture. Both memory access and event support are implemented. ARM architecture and other OS kernels (such as FreeBSD) have not been explored yet.
  • A customized GLib, a statically compiled libjson-c is cross-compiled into Mini-OS.
  • C++ language support. C++ standard library from GCC was cross-compiled into static libraries, such as libgcc, libstdc++, etc. Now in Mini-OS, we can program with C++ ! Not only C. Detailed steps can be found in this post.
  • A github site of Documentations and a Blog are maintained to document the manuals of how to build and run TinyVMI, as well as track the progress of each proceeded step during the summer.

Performance Analysis

In order to evaluate the TinyVMI system, we conduct a simple analysis and experiment to show its efficiency. We build two VM domains with LibVMI on the same hypervisor for comparison. One guest VM running Mini-OS with LibVMI and another VM, Dom0, running Linux (Ubuntu 16.04) with LibVMI. The target VM being introspected is a 64-bit Linux (Ubuntu 16.04). Results are shown in Fig.2 and Fig.3.

Fig.2 Code Size of LibVMI and Different Kernels

Fig.3 Time in Walking Through Page Table

Fig.2 shows the overall code size of the OS with LibVMI in it. LibVMI with MiniOS totaled 83K Lines of Code (LoC) while LibVMI with Linux kernel had 177K LoC, reducing more than 50% percent of code size. Note that the LoC of Linux kernel does not include any driver codes, which only reflects the possible minimal size of a Linux kernel. If drivers included, it could be 15M+ LoC for Linux system.

Fig.3 shows the time elapsed of reading one page by walking through the 4 levels of the page table while introspecting a 64-bit Linux guest VM. The time is an average of reading 500 consecutive pages. LibVMI in Mini-OS took 3.7 microseconds, while LibVMI in Linux took 5.7 microseconds, saving more than 30% of the time.

Conclusion

To briefly conclude the project, we have successfully ported the core functionalities of LibVMI into the tiny OS on Xen, Mini-OS. By customizing the XSM policy specifications and Xenstore permissions, a guest VM has been granted with permissions to introspect another guest VM via VMI technique. By customizing and cross compiling static libraries into Mini-OS, we have built LibVMI in a tiny OS, enabling a tiny VM to introspect both Linux and Windows guest VMs. Evaluations show the code size is reduced by more than 50% and performance is improved by more than 30% compared to VMI operations on Dom0 on the hypervisor.

Future Directions

  • DRAKVUF integration. After the last week of GSoC, C++ language support was added to TinyVMI under the help of this post from Notes to self. The next step would be cross-compiling the DRAKVUF system into TinyVMI. This will enable more applications to take full advantage of LibVMI interfaces already provided in the Mini-OS.
  • Dom0 Introspection. We all know Dom0 is huge. Although much work has been done to disaggregate it, it is still huge. TinyVMI itself has a small trusted computing base (TCB). However, we still need to trust Dom0 to enforce the XSM policies. This enlarges the TCB of the system significantly. Since we have to trust Dom0, it will be useless to monitor the main memory of Dom0 from TinyVMI. A further step to disaggregate Dom0 would be separate the XSM module management interface into another sub domain, or just to the same domain as TinyVMI. Taking this apart would make it possible to eliminate Dom0 from the trusted computing base, and allow TinyVMI to monitor Dom0 via VMI techniques.

Acknowledgment

Thanks to my mentors, Steven Maresca and Tamas K Lengyel, for accepting me as a student in GSoC this year. This is my first time at GSoC and this exciting project could not have been achieved without your prompt, helpful instructions and graceful patience. Thanks to Zibby Keaton for the grammar checkings on this post. Thanks to all Google Summer of Code committees for providing such a great opportunity for us to explore the world of open source!

 

Killing Processes that Don’t Want to be Killed

This article originally appeared on lwn.net.

Suppose you have a program running on your system that you don’t quite trust. Maybe it’s a program submitted by a student to an automated grading system. Or maybe it’s a QEMU device model running in a Xen control domain ("domain 0" or “dom0”), and you want to make sure that even if an attacker from a rogue virtual machine manages to take over the QEMU process, they can’t do any further harm. There are many things you want to do as far as restricting its ability to do mischief. But one thing in particular you probably want to do is to be able to reliably kill the process once you think it should be done. This turns out to be quite a bit more tricky than you’d think.

Continue reading

A Recap of the Xen Project Developer and Design Summit: Community Health, Development Trends, Coding Changes and More

We were extremely thrilled to host our Xen Project Developer and Design Summit in Nanjing Jiangning, China this June. The event brought together our community and power users under one roof to collaborate and to learn more about the future of our project. It also gave us the opportunity to connect with a large group of our community who is based in China. We’ve seen a steady stream of Xen Project hypervisor adoption in this region.

If you were unable to attend the event, we have recordings of the presentations, and we also have the slideshares from the presentation available. Please check them out!

During our event, we always start with a weather report on the Xen Project. It covers areas that we are improving upon, where we need more support, and also the potential direction of the project. This blog covers information from the weather report as well as next steps and focus areas for the project.

Community Health
Code commits for the hypervisor have on average grown by 11% YoY since 2014. Commits in the first 5 months of 2018 have grown 11% compared to the same period last year. The top 6 contributors to the project since 2011 have been Citrix, Suse, AMD, Arm, Intel, Oracle. This is also true for the last 12 months in which 90% of contributions came from the top 6 players.

However, we have seen a larger than normal volume of contributions from Arm and AMD, which contributed twice as much as in previous years. In addition, EPAM is establishing itself on the top table with first contributions and a significant number of code reviews.  In addition, AWS started to make first significant code contributions in 2017.

Hardware security issues had an impact on the code review process of the project and thus on the project’s capability to take in some code. In other words, x86 related development that was not directly attributed to hardware security issues were slowed down, because developers normally reviewing contributions had less bandwidth to do so.

This has forced the community to make some changes that are starting to have a positive effect: x86 developers across companies are collaborating more and better, meaning that hardware security issues in 2018 had a smaller impact on the community than those in 2017.

Innovation and Development Trends
Unikraft, a Xen Project sub-project, is on a healthy growth projection. Unikraft aims to simplify the process of building unikernels through a unified and customizable code base. It was created after Xen Project Developer and Design Summit 2017.

The project recently upstreamed a significant amount of functionality, including:

  • Scheduling support, better/more complete support for KVM/Xen/Linux. Supporting Xen/KVM allows Unikraft to cater to a larger set of potential users/companies. Linux user-space provides an excellent development environment: Unikraft users can create their Unikraft unikernels as a Linux executable, use Linux’s wide range of debugging and performance optimization tools, and when done simply re-compile as a KVM or Xen unikernel (work on creating x86/Arm bare metal images is ongoing).
  • A release of newlib (a libc-like library) and lwip (a network stack: This support allows Unikraft to compile with most applications. It is a basic requirement to support a potentially wide range of applications.
  • The project is beginning to pick up traction with contributions coming from companies like NEC, Arm, and Oracle.

For more information check out the following two presentations: Unikraft and Unikraft on Arm.

We have been re-writing the x86 core. We are working on adding complex new CPU hardware features such as support for NVDIMMs and SGX. In addition, we are working on making technologies that have been used by security-conscious vendors in non-server environments ready to be used in server virtualization and cloud computing; support for measured boot is an example.

Another key innovation is a project called Panopticon, which aims to re-write some portions of the hypervisor to make Xen resilient to all types of side-channel attacks by removing unnecessary information about guests from the hypervisor.

You can find presentations related to these topics here (x86 evolution) and here (side-channel attacks and mitigations).

Continued Growth in Embedded and Automotive
We are seeing continued contributions within the embedded and automotive space to Xen Project Core with new features and functionality, including:

  • Co-processor (GPU) sharing framework enabling virtualization of co-processors such as FPGAs, DRMs, etc.
  • 2nd generation Power management and HPM on Arm  – this enables a huge reduction in power consumption, which is significant for some embedded market segments.
  • RTOS based Dom0 and code size reduction – this reduces the cost of safety certification significantly and is important for market segments where safety certification is important (such as automotive, avionics, medical, etc). We already managed to get Xen code size on Arm to below 45K SLOC and we expect that Dom0 will also be below 50K SLOC. This makes it possible to safety certify a Xen based stack to DAL C ASIL-B/C standards at a cost equivalent to less than 10 years.
  • Improved startup latency to boot multiple VMs in parallel from the device tree – this opens up the use of Xen to small IoT and embedded devices and allows booting of a complete Xen system in milliseconds compared to seconds. In addition, it halves the cost of safety certifications for systems where a Dom0 is not necessary

You can see the progress of our re-architecture in our latest release, Xen Project hypervisor 4.11. Also, the following summit presentations were relevant: here (Xen and automotive at Samsung) here (CPUFreq) and here (Real-time support).

These are just a few features and updates that make it easier for Xen to be used in embedded environments and market segments where safety certification is relevant. In addition, this will also significantly improve BoM and security in other market segments. On x86 we are also reducing code size, but this is significantly harder because of backward compatibility guarantees for x86 hardware and older operating systems.

Conclusion
The event was a great success with a lot of community and technical topics, like “How to Get Your Code Into Xen” and “The Art of Virtualizing Cache Maintenance.” Find the playlist for the full conference here. Additionally, our design sessions focused on architecture, embedded and safety, security, performance, and working practices and processes. You can find what was discussed, and next steps with these areas on our wiki.

If you want to stay abreast of where and when the Xen Project Developer and Design Summit will be held next year, follow us on Facebook and Twitter.

Xen Project Matrix

Xen Project Hypervisor: Virtualization and Power Management are Coalescing into an Energy-Aware Hypervisor

Power management in the Xen Project Hypervisor historically targets server applications to improve power consumption and heat management in data centers reducing electricity and cooling costs. In the embedded space, the Xen Project Hypervisor faces very different applications, architectures and power-related requirements, which focus on battery life, heat, and size.

Although the same fundamental principles of power management apply, the power management infrastructure in the Xen Project Hypervisor requires new interfaces, methods, and policies tailored to embedded architectures and applications. This post recaps Xen Project power management, how the requirements change in the embedded space, and how this change may unite the hypervisor and power manager functions. Read the full article on Linux.com here.

Automotive, Security and the Future of the Xen Project at The Xen Project Developer and Design Summit

The Xen Developer and Design Summit schedule is now live! This conference combines the formats of the Xen Project Developer Summits with the Xen Project Hackathons. If you are part of the Xen Project’s community of developers and power users, come join us in Budapest, Hungary, July 11 – 13 for this must-attend event!

pandas-656890_1920

The conference will cover many different topic areas including community, embedded/automotive, performance, tooling, hardware, security and more. The format will include traditional panels and presentation, as well as design and problem solving sessions.

Design and problem solving session proposals will be accepted until July 7. This is a great way to meet other developers face-to-face to:

  • Discuss and advance the design and architecture of future functionality
  • Coordinate and plan upcoming features
  • Discuss and share best practices and ideas on how to improve community collaboration
  • Hear interactive sessions covering lessons learned from contributors, users and vendor

Submit your design and problem solving ideas here.

Keynotes this year are coming from Lars Kurth, Xen Project Chairperson and Director of Open Source Solutions at Citrix; Oleksandr Andrushchenko, Lead Software Engineer at EPAM Systems; Stefano Stabellini, Virtualization Architect at Aporeto; and Wei Liu, Senior Software Engineer at Citrix.

Here’s a small sampling of other speaking sessions during the conference:

Automotive

  • Dedicated Secure Domain as an Approach for Certification of Automotive Sector Solutions from Iurii Mykhalskyi of GlobalLogic
  • Harmony of CPU Scheduling Between RT Guest OS and Rich Guest OS in Automotive Virtualization from Sangyun Lee of LG Electronics

Security

  • Hypervisor-Based Security: Bringing Virtualized Exceptions Into the Game from Mihai Dontu of Bitdefender
  • Uniprof: Transparent Unikernel Performance Profiling and Debugging from Florian Schmidt of NEC

Future of Xen

  • Intel GVT-g: From Production to Upstream from Zhi Wang of Intel
  • Recent and Ongoing Xen Related Work in the Linux Kernel from Jürgen Groß of SUSE

General Hypervisor

  • Bring up PCI Passthrough on ARM from Julien Grall of ARM
  • EFI Secure Boot, Shim and Xen: Current Status of Developments from Daniel Kiper of Oracle

You can view the entire schedule here. Early bird specials for tickets (price is $250) are available until May 31st.

A special thank you to our Diamond Sponsor Citrix and Gold sponsors ARM, Intel and Superfluidity. We look forward to seeing you at the event in July, and please stay informed on Xen Project updates by following us on social (Twitter and Facebook) and registering to our xen-announce mailing list.

 

Xen Hypervisor to be Rewritten

The hypervisor team has come to the conclusion that using the C programming language, which is 45 years old as of writing, is not a good idea for the long term success of the project.

C, without doubt, is ridden with quirks and undefined behaviours. Even the most experienced developers find this collection of powerful footguns difficult to use. We’re glad that the development of programming languages in the last decade has given us an abundance of better choices.

After a heated debate among committers, we’ve settled on picking two of the most popular languages on HackerNews to rewrite the Xen hypervisor project. Our winners are Rust and JavaScript.

Rust, although not old enough to drink, has attracted significant attention in recent years. The hypervisor maintainers have acquainted themselves with the ownership system, borrow checker, lifetimes and cargo build system. We will soon start rewriting the X86 exception handler entry point, which has been a major source of security bugs in the past, and looks like an easy starting point for the conversion to Rust.

JavaScript has been a corner stone of web development since early 2000. With the advancement of React Native and Electron, plus the exemplary success of Atom and Visual Studio Code editors, it now makes sense to start rebuilding the Xen hypervisor toolstack in JavaScript. We’re confident that Node.js would be of great help when it comes to performance. And we believe Node.js and the current libxenlight event model is a match made in heaven.

Due to the improved ergonomics of the two programming languages, we expect developer efficiency to be boosted by factor of 10. We’re also quite optimistic that we can tap into the large talent pool of Rust and JavaScript developers and get significant help from them. We expect the rewrite to be finished and released within the year – by April 2018.

For those who want a more solid, tried and true technology, we are open to the idea of toolstack middleware being written in PHP and frontend JavaScript. But since maintainers are too busy playing with their new shiny toys, those who want PHP middleware will have to step up and help.

Stay put and get ready to embrace the most secure and easy to use Xen hypervisor ever, on April 1st 2018!

Note that this article was an April fools joke and was entirely made up.