Category Archives: Security

PV Calls: a new paravirtualized protocol for POSIX syscalls

Let’s take a step back and look at the current state of virtualization in the software industry. X86 hypervisors were built to run a few different operating systems on the same machine. Nowadays they are mostly used to execute several instances of the same OS (Linux), each running a single server application in isolation. Containers are a better fit for this use case, but they expose a very large attack surface. It is possible to reduce the attack surface, however it is a very difficult task, one that requires minute knowledge of the app running inside. At any scale it becomes a formidable challenge. The 15-year-old hypervisor technologies, principally designed for RHEL 5 and Windows XP, are more a workaround than a solution for this use case. We need to bring them to the present and take them into the future by modernizing their design.

The typical workload we need to support is a Linux server application which is packaged to be self contained, complying to the OCI Image Format or Docker Image Specification. The app comes with all required userspace dependencies, including its own libc. It makes syscalls to the Linux kernel to access resources and functionalities. This is the only interface we must support.

Many of these syscalls closely correspond to function calls which are part of the POSIX family of standards. They have well known parameters and return values. POSIX stands for “Portable Operating System Interface”: it defines an API available on all major Unixes today, including Linux. POSIX is large to begin with and Linux adds its own set of non-standard calls on top of it. As a result a Linux system has a very high number of exposed calls and, inescapably, also a high number of vulnerabilities. It is wise to restrict syscalls by default. Linux containers struggle with it, but hypervisors are very accomplished in this respect. After all hypervisors don’t need to have full POSIX compatibility. By paravirtualizing hardware interfaces, Xen provides powerful functionalities with a small attack surface. But PV devices are the wrong abstraction layer for Docker apps. They cause duplication of functionalities between the guest and the host. For example, the network stack is traversed twice, first in DomU then in Dom0. This is unnecessary. It is better to raise hypervisor abstractions by paravirtualizing a small set of syscalls directly.

PV Calls

It is far easier and more efficient to write paravirtualized drivers for syscalls than to emulate hardware because syscalls are at a higher level and made for software. I wrote a protocol specification called PV Calls to forward POSIX calls from DomU to Dom0. I also wrote a couple of prototype Linux drivers for it that work at the syscall level. The initial set of calls covers socket, connect, accept, listen, recvmsg, sendmsg and poll. The frontend driver forwards syscalls requests over a ring. The backend implements the syscalls, then returns success or failure to the caller. The protocol creates a new ring for each active socket. The ring size is configurable on a per socket basis. Receiving data is copied to the ring by the backend, while sending data is copied to the ring by the frontend. An event channel per ring is used to notify the other end of any activity. This tiny set of PV Calls is enough to provide networking capabilities to guests.

We are still running virtual machines, but mainly to restrict the vast majority of applications syscalls to a safe and isolated environment. The guest operating system kernel, which is provided by the infrastructure (it doesn’t come with the app), implements syscalls for the benefit of the server application. Xen gives us the means to exploit hardware virtualization extensions to create strong security boundaries around the application. Xen PV VMs enable this approach to work even when virtualization extensions are not available, such as on top of Amazon EC2 or Google Compute Engine instances.

This solution is as secure as Xen VMs but efficiently tailored for containers workloads. Early measurements show excellent performance. It also provides a couple of less obvious advantages. In Docker’s default networking model, containers’ communications appear to be made from the host IP address and containers’ listening ports are explicitly bound to the host. PV Calls are a perfect match for it: outgoing communications are made from the host IP address directly and listening ports are automatically bound to it. No additional configurations are required.

Another benefit is ease of monitoring. One of the key aspects of hardening Linux containers is keeping applications under constant observation with logging and monitoring. We should not ignore it even though Xen provides a safer environment by default. PV Calls forward networking calls made by the application to Dom0. In Dom0 we can trivially log them and detect misbehavior. More powerful (and expensive) monitoring techniques like memory introspection offer further opportunities for malware detection.

PV Calls are unobtrusive. No changes to Xen are required as the existing interfaces are enough. Changes to Linux are very limited as the drivers are self-contained. Moreover, PV Calls perform extremely well! Let’s take a look at a couple of iperf graphs (higher is better):

iperf client

iperf server

The first graph shows network bandwidth measured by running an iperf server in Dom0 and an iperf client inside the VM (or container in the case of Docker). PV Calls reach 75 gbit/sec with 4 threads, far better than netfront/netback.

The second graph shows network bandwidth measured by running an iperf server in the guest (or container in the case of Docker) and an iperf client in Dom0. In this scenario PV Calls reach 55 gbit/sec and outperform not just netfront/netback but even Docker.

The benchmarks have been run on an Intel Xeon D-1540 machine, with 8 cores (16 threads) and 32 GB of ram. Xen is 4.7.0-rc3 and Linux is 4.6-rc2. Dom0 and DomU have 4 vcpus each, pinned. DomU has 4 GB of ram.

For more information on PV Calls, read the full protocol specification on xen-devel. You are welcome to join us and participate in the review discussions. Contributions to the project are very appreciated!

Virtual Machine Introspection: A Security Innovation With New Commercial Applications

The article from Lars Kurth, the Xen Project chairperson, was first published on Linux.com.

A few weeks ago, Citrix and Bitdefender launched XenServer 7 and Bitdefender Hypervisor Introspection, which together compose the first commercial application of the Xen Project Hypervisor’s Virtual Machine Introspection (VMI) infrastructure. In this article, we will cover why this technology is revolutionary and how members of the Xen Project Community and open source projects that were early adopters of VMI (most notably LibVMI and DRAKVUF) collaborated to enable this technology.

Evolving Security Challenges in Virtual Environments

Today, malware executes in the same context and with the same privileges as anti-malware software. This is an increasing problem, too. The Walking Dead analogy I introduced in this Linux.com article is again helpful. Let’s see how traditional anti-malware software fits into the picture and whether our analogy applies to anti-malware software.

In the Walking Dead universe, Walkers have taken over the earth, feasting on the remaining humans. Walkers are active all the time, and attracted by sound, eventually forming a herd that may overrun your defences. They are strong, but are essentially dumb. As we explored in that Linux.com article, people make mistakes, so we can’t always keep Walkers out of our habitat.

For this analogy, let’s equate Walkers with malware. Let’s assume our virtualized host is a village, consisting of individual houses (VMs) while the Hypervisor and network provides the infrastructure (streets, fences, electricity, …) that bind the village together.

Enter the world of anti-malware software: assume the remaining humans have survived for a while and re-developed technology to identify Walkers fast, destroy them quickly and fix any damage caused. This is the equivalent of patrols, CCTV, alarmed doors/windows and other security equipment, troops to fight Walkers once discovered and a clean-up crew to fix any damage. Unfortunately, the reality of traditional malware security technology can only be deployed within individual houses (aka VMs) and not on the streets of our village.

To make matters worse, until recently malware was relatively dumb. However, this has changed dramatically in the last few years. Our Walkers have evolved into Wayward Pine’s Abbies, which are faster, stronger and more intelligent than Walkers. In other words, malware is now capable of evading or disabling our security mechanisms.

What we need is the equivalent of satellite surveillance to observe the entire village, and laser beams to remotely destroy attackers when they try and enter our houses. We can of course also use this newfound capability to quickly deploy ground troops and clean-up personnel as needed. In essence that is the promise that Virtual Machine Introspection gives us. It allows us to address security issues from outside the guest OS without relying on functionality that can be rendered unreliable from the ground. More on that topic later.

From VMI in Xen to the First Commercial Application: A Tale of Collaboration

The development of Virtual Machine Introspection and its applications show how the Xen Project community is bringing revolutionary technologies to market.

The development of Virtual Machine Introspection and its applications show how the Xen Project community is bringing revolutionary technologies to market.

The idea of Virtual Machine Introspection for the Xen Project Hypervisor hatched at Georgia Tech in 2007, building on research by Tal Garfinkel and Mendel Rosenblum in 2003. The technology was first incorporated into the Xen Project Hypervisor via the XenAccess and mem-events APIs in 2009. To some degree, this was a response to VMware’s VMsafe technology, which was introduced in 2008 and deprecated in 2012, as the technology had significant limitations at scale. VMSafe was replaced by vShield, which is an agent-based, hypervisor-facilitated, file-system anti-virus solution that is effectively a subset of VMsafe.

Within the Xen Project software however, Virtual Machine Introspection technology lived on due to strong research interests and specialist security applications where trading off performance against security was acceptable. This eventually led to the creation of LibVMI (2010), which made these APIs more accessible. This provided an abstraction that eventually allowed exposure of a subset of Xen’s VMI functionality to other open source virtualization technologies such as KVM and QEMU.

In May 2013, Intel launched its Haswell generation of CPUs, which is capable of maintaining up to 512 EPT pointers from the VMCS via the #VE and VMFUNC extensions. This proved to be a potential game-changer for VMI, enabling hypervisor controlled and hardware enforced strong isolation between VMs with lower than previous overheads, which led to a collaboration of security researchers and developers from Bitdefender, Cisco, Intel, Novetta, TU Munich and Zentific. From 2014 to 2015, the XenAccess and mem-events APIs have been re-architected into the Xen Project Hypervisor’s new VMI subsystem, alt2pm and other hardware capabilities have been added, as well as support for ARM CPUs and a baseline that was production ready has been released in Xen 4.6.

Citrix and Bitdefender collaborated to bring VMI technology to market: XenServer 7.0 introduced its Direct Inspect APIs built on the Xen Projects VMI interface. It securely exposes the introspection capabilities to security appliances, as implemented by Bitdefender HVI.

What Can Actually Be Done Today?

Coming back to our analogy: what we need is the equivalent of satellite surveillance to observe the entire village. Does VMI deliver? In theory, yes: VMI makes it possible to observe the state of any virtual machine (house and its surroundings in the village), including memory and CPU state and to receive events when the state of the virtual machine changes (aka if there is any movement). In practice, the performance overhead of doing this is far too high, despite using hardware capabilities.

In our imagined world that is overrun by Walkers and Abbies, this is equivalent to not having the manpower to monitor everything, which means we have to use our resources to focus on high value areas. In other words, we need to focus on the suspicious activity on system perimeters (the immediate area surrounding each of our houses).

This focus is executed by monitoring sensitive memory areas for suspicious activity. When malicious activity is detected, a solution can take corrective actions on the process state (block, kill) or VM state (pause, shutdown) while collecting and reporting forensic details directly from a running VM.

Think of a laser beam on our satellite that is activated whenever an Abbie or Walker approaches our house. In technical terms, the satellite and laser infrastructure maps to XenServer’s Direct Inspect API, while the software which controls and monitors our data maps onto Bitdefenders Hypervisor Introspection.

It is important to stress that monitoring and remedial action takes place from the outside, using the hypervisor to provide hardware-enforced isolation. This means that our attackers cannot disable surveillance nor laser beams.

Of course, no security solution is perfect. This monitoring software may not always detect all suspicious activity, if that activity does not impact VM memory. This does not diminish the role of file-system-based security; we must still be vigilant, and there is no perfect defense. In our village analogy, we could also be attacked through underground infrastructure such as tunnels and canalisation. In essence this means we have to use VMI together with traditional anti-malware software.

How does VMI compare to traditional hypervisor-facilitated anti-virus solutions such as vShield? In our analogy, these solutions require central management of all surveillance equipment that is installed in our houses (CCTV, alarmed doors/windows, …) while the monitoring of events is centralized very much like a security control centre in our village hall. Albeit such an approach significantly simplifies monitoring and managing of what goes on within virtual machines, it does not deliver the extra protection that introspection provides.

You can find more information (including some demos) about VMI, XenServer Direct Inspect API and BitDefender Hypervisor Introspection here:

Xen Project Virtual Machine Introspection

Conclusion

The development of VMI and its first open source and commercial applications show how the Xen Project community is innovating in novel ways, and is capable of bringing revolutionary technologies to market. The freedom to see the code, to learn from it, to ask questions and offer improvements has enabled security researchers and vendors such as Citrix and Bitdefender to bring new solutions to market.

It is also worth pointing out that hardware-enabled security technology is moving very fast: only a subset of Intel’s #VE and VMFUNC extensions are currently being deployed in VMI. Making use of more hardware extensions carries the promise of combining the protection of out-of-guest tools with the performance of in-guest tools.

What is even more encouraging is that other vendors such as A1Logic, Star Lab and Zentific are working on new Xen Project-based security solutions. In addition, the security focused, Xen-based OpenXT project has started to work more closely with the Xen Project community, which promises further security innovation.

A few of these topics will be discussed in more detail during Xen Project Developer Summit happening in Toronto, CA from August 25 – 26, 2016. You learn more about the event here.

Intel hosts OpenXT Summit on Xen Project based Client Virtualization, June 7-8 in Fairfax, VA, USA

This is a guest blog post by Rich Persaud, former member of the Citrix XenServer and XenClient engineering and business teams. He is currently a consultant to BAE Systems, working on the OpenXT project, which stands on the shoulders of the Xen Project, OpenEmbedded Linux and XenClient XT.

While the Xen Project is well known for servers and hosted infrastructure, Type-1 hypervisors have been used in client endpoints and network appliances, improving security and remote manageability. Virtualization-based security in Qubes and Windows 10 is also educating system administrators about hardware security (IOMMU and TPM) and application trust models.

Released as open-source software in 2014, OpenXT is a development toolkit for hardware-assisted security research and appliance integration. It includes hardened Linux VMs that can be configured as a user-facing software appliance for client devices, with virtualization of storage, network, input, sound, display and USB devices. Hardware targets include laptops, desktops and workstations.

OpenXT stands on the shoulders of the Xen Project, OpenEmbedded Linux and XenClient XT. It is optimized for hardware-assisted virtualization with an IOMMU and a TPM. It configures Xen network driver domains, Linux stub domains, Xen Security Modules, Intel TXT, SE Linux, GPU passthrough and VPNs. Guest operating systems include Windows, Linux and FreeBSD. VM storage options include encrypted VHD files with boot-time measurement and non-persistence.

The picture below shows a typical OpenXT software stack, including Xen, Linux and other components.

The picture above shows one of many configurations of the OpenXT software stack, including Xen, Linux and other components.

OpenXT enables loose coupling of open-source and proprietary software components, verifiable measurements of hardware and software, and verified launch of derivative products. It has been used to develop locally/centrally managed software appliances that isolate high-risk workloads, networks and devices.

The inaugural OpenXT Summit brings together developers and ecosystem participants for a 2-day conference in Fairfax, VA, USA on June 7-8, 2016. The event is hosted by Intel Corporation. The audience for this event includes kernel and application developers, hardware designers, system integrators and security architects.

The 2016 OpenXT Summit will chart the evolution of OpenXT from cross-domain endpoint virtualization to an extensible systems innovation platform, enabling derivative products to make security assurances for diverse hardware, markets and use cases.

The Summit includes one day of presentations, a networking reception and one day of moderated technical discussions. Presentation topics will include OpenXT architecture, TPM 2.0, Intel SGX, Xen security, measured launch, graphics virtualization and NSA research on virtualization and trusted computing.

For more information, please see the event website at http://openxt.org/summit.

For presentations and papers related to OpenXT, please see http://openxt.org/history.

Security vs Features

We’ve just released a rather interesting batch of Xen security advisories. This has given rise in some quarters to grumbling around Xen not taking security seriously.

I have a longstanding interest in computer security. Nowadays I am a member of the Xen Project Security Team (the team behind security@xenproject, which drafts the advisories and coordinates the response). But I’m going to put forward my personal opinions.

Of course Invisible Things are completely right that security isn’t taken seriously enough. The general state of computer security in almost all systems is very poor. The reason for this is quite simple: we all put up with it. We, collectively, choose convenience and functionality: both when we decide which software to run for ourselves, and when we decide what contributions to make to the projects we care about. For almost all software there is much stronger pressure (from all sides) to add features, than to improve security.

That’s not to say that many of us working on Xen aren’t working to improve matters. The first part of improving anything is to know what the real situation is. Unlike almost all corporations, and even most Free Software projects, the Xen Project properly discloses, via an advisory, every vulnerability discovered in supported configurations. We also often publish advisories about vulnerabilities in other relevant projects, such as Linux and QEMU.

Security bugs are bugs, and over the last few years the Xen Project’s code review process has become a lot more rigorous. As a result, the quality of code being newly introduced into Xen has improved a lot.

For researchers developing new analysis techniques, Xen is a prime target. A significant proportion of the reports to security@xenproject are the result of applying new scanning techniques to our codebase. So our existing code is being audited, with a focus on the areas and techniques likely to discover the most troublesome bugs.

As I say, the Xen Project is very transparent about disclosing security issues; much more so than other projects. This difference in approach to disclosure makes it difficult to compare the security bug density of competing systems. When I worked for a security hardware vendor I was constantly under pressure to explain why we needed to do a formal advisory for our bugs. That is what security-conscious users expect, but our competitors’ salesfolk would point to our advisories and say that our products were full of bugs. Their products had no publicly disclosed security bugs, so they would tell naive customers that their products had no bugs.

I do think Xen probably has fewer critical security bugs than other hypervisors (whether Free or proprietary). It’s the best available platform for building high security systems. The Xen Project’s transparency is very good for Xen’s users. But that doesn’t mean Xen is good enough.

Ultimately, of course, a Free Software project like Xen is what the whole community makes it. In the project as a whole we get a lot more submissions of new functionality than we get submissions aimed at improving the security.

So personally, I very much welcome the contributions made by security-focused contributors – even if that includes criticism.

The Bitdefender virtual machine introspection library is now on GitHub

The security threats we’re facing today are becoming increasingly sophisticated. Rootkits, and malware taking advantage of kernel and 0-day vulnerabilities pose especially serious challenges for classic anti-malware solutions, due to the latter’s lack of isolation: they’re typically executing in the same context as the malware they’re trying to prevent.

malware_types

However, via Xen guest memory introspection, the anti-malware application can now live outside of the monitored guest, without relying on functionality that can be rendered unreliable by advanced malware (no agent or any other type of special software needs to run inside the guest). It does this by analyzing the raw memory of the guest OS, and identifying interesting kernel memory areas (driver objects and code, IDT, etc.) and user space memory areas (process code, stack, heap). This is made possible by harnessing existing hardware virtualization extensions (Intel EPT / AMD RVI). We hook the guest OS memory by marking interesting 4K pages as non-execute, or non-write, then audit accesses to those areas in code running in a separate domain.

Using hooks, we can detect that a driver (or kernel module) has been loaded or unloaded, that a new user process has been created, when user stack or heap is being allocated, and when memory is being paged in or out.

how_does_it_work

Using these techniques, we can protect against known rootkit hooking techniques, protect specific user processes against code injection, function detouring, code executing from the stack or heap areas, and unpacked malicious code. We can also inject remediation tools into the guest on-the-fly (with no help needed from within the guest).

In our quest for zero-footprint guest introspection using Xen, we needed to be able to decide whether changes to the guest OS are benign or malicious, and, once decided, to control whether they are allowed to happen, or rejected.

To that end, we needed a way to be notified of interesting Xen guest events, with a very low overhead, that we could use in dom0 (or a similarly privileged dedicated domain), and that would connect our introspection logic component to the guest.

We’re very happy to announce that the library we’ve created to help us perform virtual machine introspection, libbdvmi, is now on GitHub, under the LGPLv3 license, here. The library is x86-specific, and while there’s some #ifdeferry suggesting that the earliest supported version is Xen 4.3, only Xen 4.6 will work out-of-the-box with it. It has been written from scratch, using libxenctrl and libxenstore directly.

Xen 4.6 is an important milestone, as it has all the features needed to build a basic guest memory introspection application: in addition to support for emulating instructions in response to mem_access events and being able to suppress suspicious writes (which we’ve been able to contribute to 4.5), with the kind help of the Xen developer community we’ve now added support for XCR write vm_events, guest-requested custom vm_events, memory content hiding when instructed to emulate an instruction via a mem_access event response, and denying suspicious CR and MSR writes in response to a vm_event.

While LibVMI is great, and has been considered for the task, it has slightly different design goals:

  • in order to work properly, LibVMI requires that you install a configuration file into either $HOME/etc/libvmi.conf or /etc/libvmi.conf, with a set of entries for each virtual machine or memory image that LibVMI will access (while it is possible to use it without extracting this detailed information, doing so severely limits available guest information, and it does reflect a core design concern);
  • it uses Glib for caching (which gets indirectly linked with client applications, and so needs to be distributed with the application, or otherwise available on the machine where the application will be deployed);
  • it doesn’t publicly offer a way of mapping guest pages from userspace (so that we could write to them directly). For those technically inclined, something along the lines of XenDriver::mapVirtMemToHost();
  • it offers no way to ask for a page fault injection in the guest.

Libbdvmi aims to provide a very efficient way of working with Xen to access guest information in an OS-agnostic manner:

  • it only connects an introspection logic component to the guest, leaving on-the-fly OS detection and decision-making to it;
  • provides a xenstore-based way to know when a guest has been started or stopped;
  • has as few external library dependencies as possible – to that end, where LibVMI has used Glib for caching, we’ve only used STL containers, and the only other dependencies are libxenctrl and libxenstore;
  • allows mapping guest pages from userspace, with the caveat that this implies mapping a single page for writing, where LibVMI’s buffer-based writes would possibly touch several non-contiguous pages;
  • it works as fast as possible – we get a lot of events, so any unnecessary userspace / hypervisor context switches may incur unacceptable penalties (this is why one of our first patches had vm_events carry interesting register values instead of querying the quest from userspace after receiving the event);
  • last but not least, since the Xen vm_event code has been in quite a bit of flux, being able to immediately modify the library code to suit a new need, or to update the code for a new Xen version, has been a bonus to the project’s development pace.

We hope that the community will find it useful, and welcome discussion! It would be especially great if as much new functionality as possible makes its way into LibVMI.

Interested in learning more? Our Chief Linux Officer Mihai Dontu will be presenting “Zero-Footprint Memory Introspection with Xen” at LinuxCon North America on Wednesday, Aug. 19. Mihai is currently involved in furthering integrating Bitdefender hypervisor-based memory introspection technology with Xen. There are also two talks about Virtual Machine Introspection at the Xen Project Developer Summit: see Virtual Machine Introspection with Xen and Virtual Machine Introspection: Practical Applications.

Schrödinger’s Cat in a (Xen) Virtualzed ‘Box’

Fedora Logo

Fedora Logo

Yes, apparently Schrödinger’s cat is alive, as the latest release of Fedora — Fedora 19, codename Schrödinger’s cat– as been released on July 2nd, and that even happened quite on time.

So, apparently, putting the cat “in a box” and all the stuff was way too easy, and that’s why we are bringing the challenge to the next level: do you dare putting Schrödinger’s cat “in a virtual box”?

In other words, do you dare install Fedora 19 within a Xen virtual machine? And if yes, how about doing that using Fedora 19 itself as Dom0?
Continue reading