Xen Project Hypervisor Power Management: Suspend-to-RAM on Arm Architectures

This is the second part of the Xen Project Hypervisor series on power management. The first article focused on how virtualization and power management are coalescing into an energy-aware hypervisor.

In this post, the focus is on a project that was started to lay the foundation for full-scale power management for applications involving the Xen Project Hypervisor on Arm architectures. The group intends to make Xen on Arm’s power management the open source reference design for other Arm hypervisors in need of power management capabilities. Read the full story via Linux.com here.

Xen Project Matrix

Xen Project Hypervisor: Virtualization and Power Management are Coalescing into an Energy-Aware Hypervisor

Power management in the Xen Project Hypervisor historically targets server applications to improve power consumption and heat management in data centers reducing electricity and cooling costs. In the embedded space, the Xen Project Hypervisor faces very different applications, architectures and power-related requirements, which focus on battery life, heat, and size.

Although the same fundamental principles of power management apply, the power management infrastructure in the Xen Project Hypervisor requires new interfaces, methods, and policies tailored to embedded architectures and applications. This post recaps Xen Project power management, how the requirements change in the embedded space, and how this change may unite the hypervisor and power manager functions. Read the full article on Linux.com here.

Xen Project 4.8.4 is available!

I am pleased to announce the release of the Xen 4.8.4. Xen Project maintenance releases are released in line with our Maintenance Release Policy. We recommend that all users of the 4.8 stable series update to the latest point release.

The release is available from its git repositories

xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/stable-4.8 (tag RELEASE-4.8.4)

or from the Xen Project download page

www.xenproject.org/downloads/xen-archives/xen-project-48-series/xen-484.html

This release contains many bug fixes and improvements. For a complete list of changes, please check the lists of changes on the download page.

What’s New in the Xen Project Hypervisor 4.11

I am pleased to announce the release of the Xen Project Hypervisor 4.11. One of our long-term development goals since the introduction of Xen Project Hypervisor 4.8 has been to create a cleaner architecture for core technology, less code and a smaller computing base for security and performance. The Xen 4.11 release has followed this approach by delivering more PVH related functionality: PVH Dom0 support is now available as experimental feature and support for running unmodified PV guests in a PVH Container has been added. In addition, significant chunks of the ARM port have been rewritten.

Mitigations against Cache Side-channel Attacks

This release contains mitigations for the Meltdown and Spectre vulnerabilities. It is worth noting that we spent a significant amount of time on completing and optimizing fixes for Meltdown and Spectre vulnerabilities. Xen 4.11 contains the following mitigations.

XPTI

We implemented performance optimized XPTI, Xen’s equivalent to KPTI. It is worth noting that only “classic PV” guests need XPTI whereas HVM and PVH can’t attack the hypervisor via Meltdown.

Branch Predictor Hardening

For x86 CPUs, we added a new framework for Intel and AMD microcode related to Spectre mitigations as well as support for Retpoline. By default, Xen will pick the most appropriate mitigations based on compiled in support, loaded microcode, and hardware details, and will virtualise appropriate mitigations for guests to use. Command line controls via the spec-ctrl command line option are available. SP4 (Speculative Store Bypass) mitigations are also available to enable guest software to protect against within-guest information leaks via spec-ctrl=ssbd. In addition, mitigation for Lazy FP state restore (INTEL-SA-00145) are available via spec-ctrl=eager-fpu.

Arm32: Mitigation for Cortex-A15, Cortex-A12, Cortex-A17 are present in Xen 4.7 and later with some caveats (update on the firmware).

Arm64: A PSCI-based mitigation framework for Spectre type vulnerabilities was introduced including concrete mitigations for Cortex-A57, A72, A73 and A75 CPUs for Xen 4.7 to Xen 4.9. An SMCCC 1.1 based mitigation is available for Cortex-A57, Cortex-A72, Cortex-A72, Cortex-A75 for Xen 4.10 and later.

PVH related Features

A key motivation behind PVH was to combine the best of PV and HVM mode, to simplify the interface between operating systems with Xen Support and the Xen Hypervisor and to reduce the attack surface of Xen. This led to the current implementation of PVH. PVH guests are lightweight HVM guests which use Hardware virtualization support for memory and privileged instructions, PV drivers for I/O and native operating system interfaces for everything else. PVH also does not require QEMU.

PVH Dom0

Xen 4.11 adds experimental PVH Dom0 support by calling Xen via dom0=pvh on the command line. Up to now, the only guest type that was capable running as Dom0, were PV guests. HVM guests require QEMU to run in Dom0 to provide some emulated services to the guest, which makes HVM guests unsuitable to run as Dom0 as QEMU is not running when Dom0 boots. PVH guests, in contrast, require no support from anything other than the hypervisor, so it can boot with no other guests running and can take on the responsibilities of Dom0. Running a PVH Dom0 increases security of Xen based systems by removing approximately 1.5 million lines of QEMU code from Xen’s trusted computing base.

Note that enabling a PVH Dom0 requires a PVH Dom0 capable Linux or FreeBSD. Patches for each operating system have been developed and are currently being upstreamed and should be available in the next Linux and FreeBSD versions.

PCI config space emulation in Xen

In Xen 4.11 support for the PCI configuration space has been moved from QEMU to the Hypervisor. Besides enabling PVH Dom0 support, this code will eventually also be available to HVM guests and PVH guests: however, additional security hardening needs to be performed before exposing such functionality to security supported guest types such as PVH or HVM guests.

PV in PVH container (or short: PVH Shim)

Support to run unmodified legacy PV-only guest to be run in PVH mode has been added in Xen 4.11. This allows cloud providers to support old, PV-only distros while only providing support for a single kind of guest (PVH). This simplifies management, reduces the surface of attack significantly, and eventually allows end-users to build a Xen hypervisor configuration with no “classic” PV support at all.

Next Steps

In subsequent releases, you should expect PVH Dom0 to become a supported feature and for PCI passthrough to be enabled in PVH guests. In addition, we will add the capability to compile PV-only and HVM-only versions of Xen.

Other Features

Scheduler Optimizations: Credit1 and Credit2 scheduling decisions when a vCPU is exclusively pinned to a pCPU or when soft-affinity is used have been performance optimised.

Add DMOPs to allow use of VGA with restricted QEMU (x86): Xen 4.9 introduced the Device Model Operation Hypercall (DMOPs) which significantly limits the capability of a compromised QEMU to attack the hypervisor. In Xen 4.11 we added DMOPs that enable the usage of the VGA console, which was previously restricted.

Enable Memory Bandwidth Allocation in Xen (Intel Skylake or newer): Xen 4.11 adds support for Memory Bandwidth Allocation (MBA), that allows Xen to slow misbehaving VMs by using a credit-based throttling mechanism.

Emulator enhancements (x86): support for previously unsupported Intel AVX and AVX2, and for AMD F16C, FMA4, FMA, XOP and 3DNow! instructions have been added to to the x86 emulator.

Guest resource mapping (x86): support for directly mapping Grant tables and IOREQ server pages have been introduced into Xen to improve performance.

Clean-up and future-proofing (Arm): Xen’s VGIC support has been re-implemented. In addition, stage-2 page table handling, memory subsystems and big.LITTLE support have been refactored to make it easier to maintain and update the code in future.

Support for PSCI 1.1 and SMCCC 1.1 compliance (Arm): Xen has been updated to comply with the latest versions of the Arm® Power State Coordination Interface and Secure Monitor Call Calling Conventions that provides an optimised calling convention and optional, discoverable support for mitigating Spectre Variant 2.

Summary

This release contains 1206 commits from 406 patch series. Contributions for this release of the Xen Project came from Citrix, Suse, ARM, AMD, Intel, Amazon, Gentoo Linux, Google, Invisible Things Lab, Oracle, EPAM Systems, Huawei, DornerWorks, Qualcomm, and a number of universities and individuals.

As in Xen 4.10, we took a security-first approach for Xen 4.11 and spent a lot of energy to improve code quality and harden security. Our efforts are not restricted to the current release, but include Xen 4.6 – 4.10: due to mitigations for side-channel attacks an unusually large number of commits – 765 in total – were back-ported to older releases to ensure that users of these releases are not impacted. Despite the disruption caused by Spectre and Meltdown, the community developed several major features and made significant progress towards completing PVH.

On behalf of the Xen Project Hypervisor team, I would like to thank everyone for their contributions (either in the form of patches, code reviews, bug reports or packaging efforts) to the Xen Project.

Please check our acknowledgement page, which recognises all those who helped make this release happen. The source can be located in the tree (tag RELEASE-4.11.0) or can be downloaded as a tarball from our website.

For detailed download and build instructions check out the guide on building Xen 4.11.

More information can be found at

Xen Project 4.7.6 is available!

I am pleased to announce the release of the Xen 4.7.6. Xen Project maintenance releases are released in line with our Maintenance Release Policy. We recommend that all users of the 4.7 stable series update to the latest point release.

The release is available from its git repositories

xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=refs/heads/stable-4.7 (tag RELEASE-4.7.6)

or from the Xen Project download page

www.xenproject.org/downloads/xen-archives/xen-project-47-series/xen-476.html

This release contains many bug fixes and improvements. For a complete list of changes, please check the lists of changes on the download page.

Improving the Stealthiness of Virtual Machine Introspection on Xen

This blog post comes from Stewart Sentanoe of the University of Passau. Stewart is a PhD student and he was recently a Google Summer of Code Intern working on the Honeynet Project. 

Project Introduction

Virtual Machine Introspection

Virtual Machine Introspection (VMI) is the process of examining and monitoring a virtual machine from the hypervisor or virtual machine monitor (VMM) point of view. Using this approach, we can get the untainted information of the monitored virtual machine. There are three main problems with VMI currently:

  • Semantic gap: How do you interpret low level data into useful information?
  • Performance impact: How big is the overhead?
  • Stealthiness: How to make the monitoring mechanism hard to be detected by the adversary?

This project focused on the third problem, and specifically on how to hide the breakpoint that has been set. We do not want the adversary to be able to detect whether there is a breakpoint that has been set to some of the memory addresses. If they are able to detect the breakpoint, most likely the adversary will not continue the attack and we will learn nothing. By leveraging VMI, we are able to build high interaction honeypot where the adversary can do whatever they want with the system. Thus, we can gather as much information as we can and we get the big picture of what’s going on in the system and learn from it.

Setting a Breakpoint Implemented by Drakvuf

DRAKVUF is a virtualization based agentless black-box binary analysis system developed by Tamas K Lengyel. DRAKVUF allows for in-depth execution tracing of arbitrary binaries (including operating systems), all without having to install any special software within the virtual machine used for analysis (https://drakvuf.com and https://github.com/tklengyel/drakvuf).

There are two ways to set a breakpoint implemented by DRAKVUF using INT3 (0xCC opcode) and Xen altp2m.

These are the following steps by DRAKVUF to inject breakpoint using INT3:

  1. Inject 0xCC into the target
  2. Mark pages Execute-only in the EPT (Extended Page Tables)
  3. If anything tries to read the page:
    1. Remove 0xCC and mark page R/W/X
    2. Singlestep
    3. Place 0xCC back and mark page X-only
  4. When 0xCC traps to Xen
    1. Remove 0xCC
    2. Singlestep
    3. Place 0xCC back

Sounds good right? But, there is a big problem when INT3 is used.

To make the breakpoint mechanism work well with multiple vCPUs, DRAKVUF uses Xen altp2m. At the normal runtime of a VM, each guest’s physical memory (GFN – Guest Frame Number) will be mapped one to one to the machine (host) physical memory (MFN – Machine Frame Number) as shown in the image below.

Next, to set a breakpoint, DRAKVUF will copy the target page to the end of the guest’s physical memory and add the trap there. DRAKVUF will also make an empty page (the purposes will be explained later) as shown below.

Now, during the runtime, the pointer of the original target will be switched h to the copy page as shown below and marked as execute only.

If a process tries to execute those pages, it can simply switch the pointer back to the original, single step and then switch the pointer to the copy page again. You might be thinking that if an adversary is able to scan “beyond” the physical memory, the adversary will detect a page that contains the copy. This where the empty page kicks in, whenever a process tries to read or write to the copy page, DRAKVUF will simply change the pointer to the empty page as shown below.

Sounds cool doesn’t it? Of course it is! But, there are several problems with this process, which led to this GSOC project. The sections below will cover them piece by piece.  

Problems of DRAKVUF

There are three problems that I found out during this project:

  1. There’s a M:1 relation between the shadow copy and the empty page, which means that if we set breakpoints to two addresses, it will create two shadow copy and only one empty page.
  2. If an adversary is able to write “something” to a shadow copy, the “something” will also appear on the other shadow copy which can raise their suspicious level.
  3. The current implementation of DRAKVUF will use ’00’  for the empty page, but the real behaviour never been observed.

Proposed Milestones

There are two milestones for this project:

  1. Create a proof of concept (kernel module) that detects the presence of DRAKVUF by trying to write “something” to one of the shadow copy and probe the second shadow copy to check the existence of the “something”
  2. Patch DRAKVUF

The Process and the Findings

At the beginning of this project, I had no idea how to read the memory beyond the physical address space, but then I found this article which describes a function (ioremap) that I used for my kernel module (available here). The drawback is that it requires some debug information generated by DRAKVUF, for example the address of the shadow copy.
https://gist.github.com/scorpionzezz/d4f19fb9cc61ff4b50439e9b1846a17a

When I executed the code without the writing part, I got this: https://gist.github.com/scorpionzezz/6e4bdd0b22d5877057823a045c784721

As expected, it gave me empty result. Then, when I wrote “something” to the first address which in this point is letter ‘A’ (in hex is 41). The ‘A’ also appears on the second address: https://gist.github.com/scorpionzezz/ce6623f1176e99de61617222ceba462a

Bingo! Something fishy there. Alright, then I tried to print more addresses: https://gist.github.com/scorpionzezz/22bdb3c727dd130bb59b28cf717d9bac

Did you see something weird there? Yes, the ‘FF’, actually the empty is ‘FF’ instead ’00’. So actually, an adversary does not need to write “something” to the empty page, it just simply detects if there are ’00’ then it reveals the presence of DRAKVUF.

But where is the ‘FF’ comes from? Architecturally, all physical addresses defined by CPUID EAX=80000008h bits 15-8 (more here) are considered “valid” In Linux, it checks the address validity when it sets up the memory page table (see here). It is up to the firmware to tell the OS and the hypervisor what range are valid with the E820 map (see here). When a process requests a memory address that is not valid (assuming the new page table is made), it goes through the Memory Management Unit (MMU) and then Platform Controller Hub (PCH). The PCH tries to find the valid physical memory but could not found it then, if it involves write, the written value will be ignored and if it involves read, it will return all 1s. This behaviour is written into this (page 32) Intel document and anyway VMI (for now) just works on Intel processor.

Alright, now time to fix this.

First is pretty easy where I just write ‘FF’ to the shadow page: https://gist.github.com/scorpionzezz/9853d836b38b82c2961c1d437390c8a3

It solved the simple problem. But now let’s talk about the bigger problem about the writing. The idea is to simply ignore write attempt to the shadow page and also to the empty page. For both cases, we can use features provided by Xen, which emulate the write. Sounds easy, but actually there was another problem: LibVMI (library that used by DRAKVUF) does not support the write emulation flag, so I needed to patch it up (see here).

Alright, now I check whenever a process tries to write to the shadow copy, then just do the emulation: https://gist.github.com/scorpionzezz/3a12bebdd43d5717d671136f0fc0069c

Now, we also need to add TRAP to the shadow copy so we can also do emulation whenever a process tries to write to it. https://gist.github.com/scorpionzezz/763cd6b9f257105f2941e104cf6f2d8e

Now every time a process tries to write to either the empty page and the shadow copy, the written value will be not “stored” in the memory. Thus, it hides DRAKVUF better.

Conclusion

This project increases the stealthiness level of DRAKVUF. With a high level of stealthiness, it opens up the potential for a new generation honeypots, intrusion detection systems and dynamic malware analysis where it will be hard for the adversary to detect the presence of the monitoring system.

Acknowledgement

Thanks to Tamas K Lengyel and Varga-Perke Balint you rock! Thank you for your help and patience. Thank you also for Benjamin Taubmann for the support and of course Honeynet and Google for GSOC 2018 🙂