Author Archives: zkeaton

Google Summer of Code Project, TinyVMI: Porting LibVMI to Mini-OS

This blog post comes from Lele Ma, a Ph.D. student at William and Mary. He was recently a Google Summer of Code Intern working on the Honeynet Project. 

Introduction

This post introduces the project I worked on with Honeynet Project at Google Summer of Code this year. The project of TinyVMI is to port a library (LibVMI) into a tiny operating system (Mini-OS). After porting, LibVMI will have all its functionalities running inside a tiny virtual machine, which has a much smaller size as well as higher performance compared to the same library running on a Linux OS.

Mini-OS & Unikernels

Mini-OS is a tiny operating system demo distributed with the source of Xen Project Hypervisor (abbreviated as Xen below). It has been a basis for the development of several unikernels, such as ClickOS and Rump kernels. Unikernels can be viewed as a minimized operating system with following features:

  • No ring0/ring3, or kernel/user mode separation. Traditional operating systems, like Linux, separate programs into kernel mode and user mode to protect malicious users (applications) from accessing kernel memory. However, in unikernels like Mini-OS, there is only one mode, ring0, or kernel mode. This eliminates the burden of maintaining the context-switching between two modes. The code size of the kernel and runtime overhead are both reduced.
  • A minimal set of libraries. Instead of shipping with a lot of system/application libraries to provide a general purpose computing environment, a unikernel aims to be configured with a minimal set of libraries that are only necessary for the application that runs in it, thus also called a library operating system. For example, in Mini-OS, users can configure with libc to write applications in C language.

Fig.1 General Purpose OS vs. Mini-OS Unikernel

As shown in Fig.1, a unikernel is much smaller in size and eliminates all unnecessary tools and libraries, and even file systems from the OS, keeping only the application code and a tiny OS kernel. Unikernels can be more efficient than traditional operating systems, especially for cloud platforms where each specialized application is usually managed in a standalone VM. Unikernels are supposed to be the next generation of cloud platforms because they can achieve efficiency in several aspects. These include but are not limited to:

  1. Less memory footprint. A unikernel requires significantly less memory than a traditional operating system. For example, a Mini-OS VM with LibVMI application only requires 64MB of main memory. However, a Linux VM would occupy 4GB of main memory to get average performance for a 64-bit Linux. The reduced memory footprints would allow a single physical machine to host more VMs and reduce the average cost per service.
  2. Faster booting. Since the memory footprint is small and has no redundant libraries or kernel modules, a tiny OS would require significantly less time to boot than a traditional OS. Booting a tinyOS is just like starting the application itself.
  3. No kernel mode switching. OS kernels and applications are in the same chunk of the memory region. CPU context switches caused by system calls are eliminated in unikernels. Therefore, the runtime performance of the unikernel can be much better than a traditional OS.
  4. More secure. Each unikernel’s VM runs only one application. Isolation between applications is enforced by the hypervisor, instead of a shared OS kernel. Compared to process isolation or container isolation in Linux, the unikernel is more secure from the lower level isolation.
  5. Easy deployment; easy to use. Unikernel applications are built into a single binary to run directly as a VM image, which simplifies the deployment of the service. Unikernel applications are designed to be single click and run. All functionalities are customized at building time. Once deployed, the binary package requires no human modifications except the whole binary package being replaced.

In brief, Mini-OS is a tiny OS originated from the Xen Project hypervisor. Like other unikernels, Mini-OS provides higher performance and a more secure computing environment than a traditional operating system on the cloud.

Why port LibVMI to MiniOS

LibVMI is a secure critical library that could be used to view a target VM’s raw memory from another guest VM, thus gaining a whole view of almost all the activities on the target VM.

Traditionally, LibVMI runs in Dom0 on the hypervisor. However, Dom0 is already very big even without LibVMI in it. I got the idea of separating LibVMI from Dom0 from the following observations:

  1. Dom0 is a general purpose OS hosting many daily use applications, such as administrator tools. However, LibVMI is a special purpose library and usually not for daily use. Furthermore, there are almost no direct communications between LibVMI and other applications. Thus it is not necessary to install LibVMI in Dom0.
  2. Security risk. Dom0 is a critical domain for the hypervisor platform. Introducing a new code base to the kernel would also introduce new security risks. Other applications on Dom0 could leverage kernel vulnerabilities to compromise LibVMI, and vice versa, a bug in LibVMI could crash other applications or even the entire Dom0 kernel.
  3. Performance overhead. As introduced above, a general purpose OS is large and inefficient to run a special purpose application. CPU mode switching, large memory footprints, and process scheduling all introduce overheads for Dom0.

Therefore, we propose to port LibVMI to the tiny Mini-OS, named TinyVMI, to explore whether we can achieve the above benefits.

Challenges

First, the hypervisor isolates each guest VM from reading other VM’s memory pages. A guest VM should get enough permission before it can be used to introspect other VM’s memory. Second, LibVMI depends on several libraries that are not supported in the original Mini-OS. Therefore, in this project, we want solutions to overcome these two challenges.

Permissions in accessing other VM’s memory

To introspect a VM’s memory from another guest VM, the first thing is to get permissions from the hypervisor. By default, memory pages of each VM are strictly isolated from each other – they are not allowed to access the memory pages of other VMs. Although the hypervisor allows programmers to share memory pages between two VMs by grant tables, it requires the target VM to explicitly offer the page for sharing. Since the entire target VM is not trusted and no changes should be made to the target VM, LibVMI uses foreign memory mapping hypercalls to remap memory pages from the target VM to its own memory space. The permission of mapping a foreign page (target VM’s page) to its own address space for a guest VM (or Dom0) is controlled by the Xen Security Module (XSM), which will be introduced below.

Furthermore, Xen event channels allow a guest VM to monitor its memory status in real time with the help of hardware interruption. A ring buffer is shared between the hypervisor and the guest kernel to transfer event information. To access the ring buffer, XSM permission should also be granted.

Xen Security Module (XSM) uses FLASK policies as in SELinux, to enforce Mandatory Access Control(MAC) between different domains. Each permission (by default) is denied unless explicitly being allowed in the policy. Permissions are granted according to multiple categories the guest domain belongs to, such as the types, roles, users, and attributes of the guest domain (more).

The category of a VM is labeled in the configuration file we use to create it via xl create <config_file>. For example:

will label the VM as type domU_t1, under the role of system_r, and user of system_u, user system_u. Type is the lowest level of the category. Multiple types can be defined as one role multiple roles as one user.

Permissions are granted based on the types of a VM. For example, the permission of map_read allows a domain to map other domain’s memory with read-only permission. The policy:

will allow a VM with type domU_t1 to read the memory of another VM with type domU_t2.

In addition to the permissions granted from XSM, we also need the permission to read information from Xenstore, which is used to get metadata of the target VM, such as getting the Domain ID from the domain’s string name. Xenstore permission can be read via the command xenstore-ls -p:

The meaning of permission could be found from the manual. Command xenstore-chmod can be used to grant reading permissions to certain VMs. For example, to enable VM with ID 8 to read Xenstore directory /local/domain, you can run:

Build New Libraries into Mini-OS

The next challenge is building new libraries into Mini-OS. Mini-OS is an exemplary minimal operating system. To keep the kernel small, there are only a few libraries that can be built in it: newlib for C language library, a Xen-related library such as libxc to communicate with the hypervisor, and lwip for basic networking.

To port LibVMI to Mini-OS, 2 more libraries are needed. These include one JSON library to parse Rekall profiles, libjson-c, and one library with utility data structures, GLib.

In theory, most libraries written in C language can be built into Mini-OS with the help of newlib, such as libjson-c. This post introduces how to build new libraries. However, some of them might need to be manually customized for MiniOS by eliminating the unsupported portions, such as GLib.

Furthermore, security applications written in C++ programs can also be ported into Mini-OS. For example, DRAKVUF is a binary analysis system built on top of LibVMI and Xen. A portion of its code is in C++ language. To build these codes in Mini-OS, we need to cross-compile C++ standard libraries into the tiny kernel.

Project Status & Results

Functions added to Mini-OS

  • Support of LibVMI functions to introspect Linux and Windows guest on x86 architecture. Both memory access and event support are implemented. ARM architecture and other OS kernels (such as FreeBSD) have not been explored yet.
  • A customized GLib, a statically compiled libjson-c is cross-compiled into Mini-OS.
  • C++ language support. C++ standard library from GCC was cross-compiled into static libraries, such as libgcc, libstdc++, etc. Now in Mini-OS, we can program with C++ ! Not only C. Detailed steps can be found in this post.
  • A github site of Documentations and a Blog are maintained to document the manuals of how to build and run TinyVMI, as well as track the progress of each proceeded step during the summer.

Performance Analysis

In order to evaluate the TinyVMI system, we conduct a simple analysis and experiment to show its efficiency. We build two VM domains with LibVMI on the same hypervisor for comparison. One guest VM running Mini-OS with LibVMI and another VM, Dom0, running Linux (Ubuntu 16.04) with LibVMI. The target VM being introspected is a 64-bit Linux (Ubuntu 16.04). Results are shown in Fig.2 and Fig.3.

Fig.2 Code Size of LibVMI and Different Kernels

Fig.3 Time in Walking Through Page Table

Fig.2 shows the overall code size of the OS with LibVMI in it. LibVMI with MiniOS totaled 83K Lines of Code (LoC) while LibVMI with Linux kernel had 177K LoC, reducing more than 50% percent of code size. Note that the LoC of Linux kernel does not include any driver codes, which only reflects the possible minimal size of a Linux kernel. If drivers included, it could be 15M+ LoC for Linux system.

Fig.3 shows the time elapsed of reading one page by walking through the 4 levels of the page table while introspecting a 64-bit Linux guest VM. The time is an average of reading 500 consecutive pages. LibVMI in Mini-OS took 3.7 microseconds, while LibVMI in Linux took 5.7 microseconds, saving more than 30% of the time.

Conclusion

To briefly conclude the project, we have successfully ported the core functionalities of LibVMI into the tiny OS on Xen, Mini-OS. By customizing the XSM policy specifications and Xenstore permissions, a guest VM has been granted with permissions to introspect another guest VM via VMI technique. By customizing and cross compiling static libraries into Mini-OS, we have built LibVMI in a tiny OS, enabling a tiny VM to introspect both Linux and Windows guest VMs. Evaluations show the code size is reduced by more than 50% and performance is improved by more than 30% compared to VMI operations on Dom0 on the hypervisor.

Future Directions

  • DRAKVUF integration. After the last week of GSoC, C++ language support was added to TinyVMI under the help of this post from Notes to self. The next step would be cross-compiling the DRAKVUF system into TinyVMI. This will enable more applications to take full advantage of LibVMI interfaces already provided in the Mini-OS.
  • Dom0 Introspection. We all know Dom0 is huge. Although much work has been done to disaggregate it, it is still huge. TinyVMI itself has a small trusted computing base (TCB). However, we still need to trust Dom0 to enforce the XSM policies. This enlarges the TCB of the system significantly. Since we have to trust Dom0, it will be useless to monitor the main memory of Dom0 from TinyVMI. A further step to disaggregate Dom0 would be separate the XSM module management interface into another sub domain, or just to the same domain as TinyVMI. Taking this apart would make it possible to eliminate Dom0 from the trusted computing base, and allow TinyVMI to monitor Dom0 via VMI techniques.

Acknowledgment

Thanks to my mentors, Steven Maresca and Tamas K Lengyel, for accepting me as a student in GSoC this year. This is my first time at GSoC and this exciting project could not have been achieved without your prompt, helpful instructions and graceful patience. Thanks to Zibby Keaton for the grammar checkings on this post. Thanks to all Google Summer of Code committees for providing such a great opportunity for us to explore the world of open source!

 

Get an Introduction to Working with the Xen Project Hypervisor and More at Open Source Summit #OSSummit

Open Source Summit is the premier event to get introduced to open source and to learn more about the trends that are surrounding this space. This year’s Open Source Summit will be held in Vancouver, BC from August 29 – 31. The event covers a wide range of topics from blockchain to security to virtualization to containers and much more.

We are very excited to have a few members of the Xen Project attending the conference and are extremely excited to host a workshop to help folks learn more about using Xen and its related technologies. If you are looking to go or are attending, below is where we will be. Come by, and say “hi.”

Xen: The Way of the Panda
Lars Kurth, the chairperson of the Xen Project, is hosting a workshop that will guide you through getting started with the Xen Project Hypervisor. Usually, you will use Xen indirectly as part of a commercial product, a distro, a hosting or cloud service and only indirectly use Xen. By following this session you will learn how Xen and virtualization work under the hood. The workshop will cover:

  • The Xen architecture and architecture concepts related to virtualization in general;
  • Storage and Networking in Xen;
  • More practically you will learn how to install Xen, create guests and work with them;
  • A detailed look at virtualization modes, boot process and troubleshooting Xen setups;
  • Memory management (ballooning), virtual CPUs, scheduling, pinning, saving/restoring and migrating VMs;
  • If time permits, we will cover some more advanced topics.

Seating is limited for this session. If you would like to attend, be sure to register asap. The workshop is happening on Wednesday, August 29 from 2:10 – 3:40 pm. Please also follow the preparation guide that is attached to the talk: you will need to download some software packages on your laptop prior to the session to avoid issues with internet bandwidth.

Disclosure Policies in the World of the Cloud: A Look Behind the Scenes

The tech world does not exist in silos and one security vulnerability can impact an entire ecosystem (case in point Meltdown and Spectre). How do open source projects and companies alike ensure that their security disclosure policies are up to standards, especially in the world of cloud computing?

This session, also led by Lars, will introduce different patterns for managing the disclosure of security vulnerabilities in use today and explore their trade-offs and limitations. Come to listen in on the conversation on Wednesday, August 29 from 12:00 – 12:40 pm.

A New Open Source Technology to Secure Containers for IoT

Containers are extremely convenient to package applications and deploy them quickly across the data center. They enable microservices oriented approaches to the development of complex apps. These technologies are benefiting the data center, but are struggling to find their place at the edge.

Embedded developers need the convenience of containers for deployment while retaining real-time capabilities and supporting mixing and matching of applications with different safety and criticality profiles on the same SoC.

A long-time contributor to the Xen Project, Stefano Stabellini, will be presenting on how ViryaOS is aiming to bring the power of containers to the embedded developer. Stefano will be talking through the proof of concept for this new technology on Wednesday, August 29 from 5:40 – 6:20 pm.

We look forward to seeing you at OSS! If you want to connect with us at the conference, please be sure to reach out to Stefano (@stabellinist) or Lars (@lars_kurth) via Twitter. You can also drop us a line in the comments section.

 

Xen Project Hypervisor Power Management: Suspend-to-RAM on Arm Architectures

This is the second part of the Xen Project Hypervisor series on power management. The first article focused on how virtualization and power management are coalescing into an energy-aware hypervisor.

In this post, the focus is on a project that was started to lay the foundation for full-scale power management for applications involving the Xen Project Hypervisor on Arm architectures. The group intends to make Xen on Arm’s power management the open source reference design for other Arm hypervisors in need of power management capabilities. Read the full story via Linux.com here.

Xen Project Matrix

Xen Project Hypervisor: Virtualization and Power Management are Coalescing into an Energy-Aware Hypervisor

Power management in the Xen Project Hypervisor historically targets server applications to improve power consumption and heat management in data centers reducing electricity and cooling costs. In the embedded space, the Xen Project Hypervisor faces very different applications, architectures and power-related requirements, which focus on battery life, heat, and size.

Although the same fundamental principles of power management apply, the power management infrastructure in the Xen Project Hypervisor requires new interfaces, methods, and policies tailored to embedded architectures and applications. This post recaps Xen Project power management, how the requirements change in the embedded space, and how this change may unite the hypervisor and power manager functions. Read the full article on Linux.com here.

Improving the Stealthiness of Virtual Machine Introspection on Xen

This blog post comes from Stewart Sentanoe of the University of Passau. Stewart is a PhD student and he was recently a Google Summer of Code Intern working on the Honeynet Project. 

Project Introduction

Virtual Machine Introspection

Virtual Machine Introspection (VMI) is the process of examining and monitoring a virtual machine from the hypervisor or virtual machine monitor (VMM) point of view. Using this approach, we can get the untainted information of the monitored virtual machine. There are three main problems with VMI currently:

  • Semantic gap: How do you interpret low level data into useful information?
  • Performance impact: How big is the overhead?
  • Stealthiness: How to make the monitoring mechanism hard to be detected by the adversary?

This project focused on the third problem, and specifically on how to hide the breakpoint that has been set. We do not want the adversary to be able to detect whether there is a breakpoint that has been set to some of the memory addresses. If they are able to detect the breakpoint, most likely the adversary will not continue the attack and we will learn nothing. By leveraging VMI, we are able to build high interaction honeypot where the adversary can do whatever they want with the system. Thus, we can gather as much information as we can and we get the big picture of what’s going on in the system and learn from it.

Setting a Breakpoint Implemented by Drakvuf

DRAKVUF is a virtualization based agentless black-box binary analysis system developed by Tamas K Lengyel. DRAKVUF allows for in-depth execution tracing of arbitrary binaries (including operating systems), all without having to install any special software within the virtual machine used for analysis (https://drakvuf.com and https://github.com/tklengyel/drakvuf).

There are two ways to set a breakpoint implemented by DRAKVUF using INT3 (0xCC opcode) and Xen altp2m.

These are the following steps by DRAKVUF to inject breakpoint using INT3:

  1. Inject 0xCC into the target
  2. Mark pages Execute-only in the EPT (Extended Page Tables)
  3. If anything tries to read the page:
    1. Remove 0xCC and mark page R/W/X
    2. Singlestep
    3. Place 0xCC back and mark page X-only
  4. When 0xCC traps to Xen
    1. Remove 0xCC
    2. Singlestep
    3. Place 0xCC back

Sounds good right? But, there is a big problem when INT3 is used.

To make the breakpoint mechanism work well with multiple vCPUs, DRAKVUF uses Xen altp2m. At the normal runtime of a VM, each guest’s physical memory (GFN – Guest Frame Number) will be mapped one to one to the machine (host) physical memory (MFN – Machine Frame Number) as shown in the image below.

Next, to set a breakpoint, DRAKVUF will copy the target page to the end of the guest’s physical memory and add the trap there. DRAKVUF will also make an empty page (the purposes will be explained later) as shown below.

Now, during the runtime, the pointer of the original target will be switched h to the copy page as shown below and marked as execute only.

If a process tries to execute those pages, it can simply switch the pointer back to the original, single step and then switch the pointer to the copy page again. You might be thinking that if an adversary is able to scan “beyond” the physical memory, the adversary will detect a page that contains the copy. This where the empty page kicks in, whenever a process tries to read or write to the copy page, DRAKVUF will simply change the pointer to the empty page as shown below.

Sounds cool doesn’t it? Of course it is! But, there are several problems with this process, which led to this GSOC project. The sections below will cover them piece by piece.  

Problems of DRAKVUF

There are three problems that I found out during this project:

  1. There’s a M:1 relation between the shadow copy and the empty page, which means that if we set breakpoints to two addresses, it will create two shadow copy and only one empty page.
  2. If an adversary is able to write “something” to a shadow copy, the “something” will also appear on the other shadow copy which can raise their suspicious level.
  3. The current implementation of DRAKVUF will use ’00’  for the empty page, but the real behaviour never been observed.

Proposed Milestones

There are two milestones for this project:

  1. Create a proof of concept (kernel module) that detects the presence of DRAKVUF by trying to write “something” to one of the shadow copy and probe the second shadow copy to check the existence of the “something”
  2. Patch DRAKVUF

The Process and the Findings

At the beginning of this project, I had no idea how to read the memory beyond the physical address space, but then I found this article which describes a function (ioremap) that I used for my kernel module (available here). The drawback is that it requires some debug information generated by DRAKVUF, for example the address of the shadow copy.
https://gist.github.com/scorpionzezz/d4f19fb9cc61ff4b50439e9b1846a17a

When I executed the code without the writing part, I got this: https://gist.github.com/scorpionzezz/6e4bdd0b22d5877057823a045c784721

As expected, it gave me empty result. Then, when I wrote “something” to the first address which in this point is letter ‘A’ (in hex is 41). The ‘A’ also appears on the second address: https://gist.github.com/scorpionzezz/ce6623f1176e99de61617222ceba462a

Bingo! Something fishy there. Alright, then I tried to print more addresses: https://gist.github.com/scorpionzezz/22bdb3c727dd130bb59b28cf717d9bac

Did you see something weird there? Yes, the ‘FF’, actually the empty is ‘FF’ instead ’00’. So actually, an adversary does not need to write “something” to the empty page, it just simply detects if there are ’00’ then it reveals the presence of DRAKVUF.

But where is the ‘FF’ comes from? Architecturally, all physical addresses defined by CPUID EAX=80000008h bits 15-8 (more here) are considered “valid” In Linux, it checks the address validity when it sets up the memory page table (see here). It is up to the firmware to tell the OS and the hypervisor what range are valid with the E820 map (see here). When a process requests a memory address that is not valid (assuming the new page table is made), it goes through the Memory Management Unit (MMU) and then Platform Controller Hub (PCH). The PCH tries to find the valid physical memory but could not found it then, if it involves write, the written value will be ignored and if it involves read, it will return all 1s. This behaviour is written into this (page 32) Intel document and anyway VMI (for now) just works on Intel processor.

Alright, now time to fix this.

First is pretty easy where I just write ‘FF’ to the shadow page: https://gist.github.com/scorpionzezz/9853d836b38b82c2961c1d437390c8a3

It solved the simple problem. But now let’s talk about the bigger problem about the writing. The idea is to simply ignore write attempt to the shadow page and also to the empty page. For both cases, we can use features provided by Xen, which emulate the write. Sounds easy, but actually there was another problem: LibVMI (library that used by DRAKVUF) does not support the write emulation flag, so I needed to patch it up (see here).

Alright, now I check whenever a process tries to write to the shadow copy, then just do the emulation: https://gist.github.com/scorpionzezz/3a12bebdd43d5717d671136f0fc0069c

Now, we also need to add TRAP to the shadow copy so we can also do emulation whenever a process tries to write to it. https://gist.github.com/scorpionzezz/763cd6b9f257105f2941e104cf6f2d8e

Now every time a process tries to write to either the empty page and the shadow copy, the written value will be not “stored” in the memory. Thus, it hides DRAKVUF better.

Conclusion

This project increases the stealthiness level of DRAKVUF. With a high level of stealthiness, it opens up the potential for a new generation honeypots, intrusion detection systems and dynamic malware analysis where it will be hard for the adversary to detect the presence of the monitoring system.

Acknowledgement

Thanks to Tamas K Lengyel and Varga-Perke Balint you rock! Thank you for your help and patience. Thank you also for Benjamin Taubmann for the support and of course Honeynet and Google for GSOC 2018 🙂

 

 

Xen Project Announces Schedule for its Annual Developer and Design Summit

Today, we are excited to announce the program and speakers for the Xen Project Developer and Design Summit. The summit brings together developers, engineers, and Xen Project power users for in-person collaboration and educational presentations. The event will take place in Nanjing Jiangning, China from June 20 -22, 2018.

This is the fifth annual Xen Project Summit with presentations and panels focusing on hypervisor performance and development, security, automotive and much more. The conference will kick-off with a weather report from Lars Kurth, chairperson of the Xen Project and director of open source at Citrix.

At last year’s Xen Project Developer Summit in Budapest, Hungary.

A sample of presentations include:

  • Sung-Min Lee, principal engineer at Samsung Electronics, will present a production-ready automotive virtualization solution with Xen.
  • Marek Marczykowski-Górecki, senior systems developer, Invisible Things Lab, will present on linux-based device model stubdomains in Qubes OS.
  • Julien Grall, senior software virtualization engineer at Arm, will share capabilities that were added to the latest revision of the ARmv7-A architectures and how Arm has been improving virtualization support with incremental versions of the Armv8 architecture.
  • Felipe Huici, chief researcher at NEC, and Florian Schmidt, research scientist at NEC, will co-present on Unikraft, a sub-project of the Xen Project aimed at automativing the process of building customized unikernels tailored to a specific applications.
  • Bo Zhang, business analyst at Huawei, will introduce Huawei Cloud’s optimization on the Xen platform to solve regular problems that occur in customer scenarios

You can view the full schedule here.

Beyond panels and presentations, the Xen Project will be running design sessions that share a similar format to Xen Project hackathons. Attendees of the conference have the opportunity to propose design sessions during the conference. Current design topics already include Making Safety Certifications for Xen Easier, From Hobbyist to Maintainer: Why and How and Reworking x86 in Xen (Current and Future Plans).

If you’ve never attended a Xen Project Developer and Design Summit, check out last year’s presentations to get a better feel for the event.

A special thank you Citrix for being a diamond sponsor of the summit.