policy

How fast is Xen on ARM, really?

Stefano Stabellini

Jun 6, 2014 — 3 min read

With Xen on ARM getting out of the early preview phase and becoming more mature, it is time to run a few benchmarks to check that the design choices paid out, the architecture is sound and the code base is solid. It is time to find out how much is the overhead introduced by Xen on ARM and how it compares with Xen and other hypervisors on x86.
I measured the overhead by running the same benchmark on a virtual machine on Xen on ARM and on native Linux on the same hardware. It takes a bit longer to complete the benchmark inside a VM, but how much longer? The answer to this question is the virtualization overhead.

Setup

I chose AppliedMicro X-Gene as the ARM platform to run the benchmarks on: it is an ARMv8 64-bit SoC with an 8 cores cpu and 16GB of RAM. I had Dom0 running with 8 vcpus and 1GB of RAM, the virtual machine that ran the tests had 2GB of RAM and 8 vcpus. To make sure that the results are comparable I restricted the amount of memory available to the native Linux run, so that Linux had all the 8 physical cores at its disposal but only 2GB of RAM.
For the x86 tests, I used a Dell server with an Intel Xeon x5650, that is a 6 cores HyperThreading cpu. HyperThreading was disabled during the tests for better performances. Similarly to the ARM tests, I had Dom0 running with 6 vcpus and 1GB of RAM and the virtual machine running with 2GB of RAM and 6 vcpus. The native Linux run had 6 physical cores and 2GB of RAM. For the KVM tests I booted the host with 3GB of RAM, then assigned 2GB of RAM to the KVM virtual machine.
In terms of software on both ARMv8 and x86 I used:

Linux 3.13 as Dom0, DomU and native kernel
Xen 4.4
OpenSUSE 13.1
QEMU-KVM 1.6.2 (for the KVM tests on x86)

I could not test KVM on ARMv8 because KVM support for X-Gene is not upstream in Linux 3.13.

Benchmarks – lower is better

The y-axis shows the overhead in terms of percentage of native: “0%” means that it is a fast as native. “1%” means that it takes 1% longer than native Linux to complete the benchmark inside a virtual machine. Given that we are dealing with overheads, lower is better.

Kernbench

Kernbench is a popular benchmark that measures the time that it takes to compile the Linux kernel. It is a cpu and memory intensive benchmark.

PBzip2

PBzip2 is a parallel implementation of bzip2. This benchmark measures the time that it takes to compress a 4GB file.

SPECjbb2005 (non-compliant)

SPECjbb2005 simulates a Java server workload. It is a cpu and memory bound benchmark.
The runs are non-compliant (therefore cannot be compared with compliant runs) and the overhead is calculated on the peak warehouse alone.

Next I ran a couple of disk IO benchmarks, but both X-Gene and the Dell server came with spinning disks for storage: the following tests showed that both disks were too slow to actually measure the virtualization overhead (it is lower than 1%).

FIO

FIO is a popular tool to measure disk performances. This benchmark uses FIO to perform a combination of random reads and writes and measures the overhead on iops.

PGBench

PGBench is the PostgresSQL database benchmarking tool. This benchmark is disk IO bound.

Conclusions

Developing Xen on ARM we have been focused on correctness and feature completeness rather than performances. Nonetheless it provides a very lower overhead that is already on par or lower than Xen’s on x86, that in turn is lower than KVM’s on x86. Given the benefits that virtualization brings to the table, including ease of deployment and lower downtimes, it really makes sense to deploy Xen on your ARM based cloud.