Xen on ARM is becoming more and more widespread in embedded environments. In these contexts, Xen is employed as a single solution to partition the system into multiple domains, fully isolated from each other, and with different levels of trust.
Every embedded scenario is different, but many require real-time guarantees. It comes down to interrupt latency: the hypervisor has to be able to deliver interrupts to virtual machines within a very small timeframe. The maximum tolerable lag changes on a case by case basis, but it should be in the realm of nanoseconds and microseconds, not milliseconds.
Xen on ARM meets these requirements in a few different ways. Firstly, Xen comes with a flexible scheduler architecture. Xen includes a set of virtual machine schedulers, including RTDS, a soft real-time scheduler, and ARINC653, a hard real-time scheduler. Users can pick the ones that perform best for their use-cases. However, if they really care about latency, the best option is to have no schedulers at all and use a static assignment for virtual cpus to physical cpus instead. There are no automatic ways to do that today, but it is quite easy to achieve with the vcpu-pin command:
Usage: xl vcpu-pin [domain-id] [vcpu-id] [pcpu-id]
For example, in a system with four physical cpus and two domains with two vcpus each, a user can get a static configuration with the following commands:
xl vcpu-pin 0 0 0
xl vcpu-pin 0 1 1
xl vcpu-pin 1 0 2
xl vcpu-pin 1 1 3
As a result, all vcpus are pinned to different physical cpus. In such a static configuration, the latency overhead introduced by Xen is minimal. Xen always configures interrupts to target the cpu that is running the virtual cpu that should receive the interrupt. Thus, the overhead is down to just the time that it takes to execute the code in Xen to handle the physical interrupt and inject the corresponding virtual interrupt to the vcpu.
For my measurements, I used a Xilinx Zynq Ultrascale+ MPSoC, an excellent board with four Cortex A53 cores and a GICv2 interrupt controller. I installed Xen 4.9 unstable (changeset 55a04feaa1f8ab6ef7d723fbb1d39c6b96ad184a) and Linux 4.6.0 as Dom0. I ran tbm as a guest, which is a tiny baremetal application that programs timer events in the future, then, after receiving them, checks the current time again to measure the latency. tbm uses the virtual timer for measurements, however, the virtual timer interrupt is handled differently compared to any other interrupts in Xen. Thus, to make the results more generally applicable, I modified tbm to use the physical timer interrupt instead. I also modified Xen to forward physical timer interrupts to guests.
Keeping in mind that the native interrupt latency is about 300ns on this board, these are the results on Xen in nanoseconds:
AVG MIN MAX WARM_MAX
4850 4810 7030 4980
AVG is the average latency, MIN is the minimum, MAX is the maximum and WARM_MAX is the maximum latency observed after discarding the first few interrupts to warm the caches. For real-time considerations, the number to keep in mind is WARM_MAX, which is 5000ns (when using static vcpu assignments).
This excellent result is small enough for most use cases, including piloting a flying drone. However, it can be further improved by using the new vwfi Xen command line option. Specifically, when vcpus are statically assigned to physical cpus using vcpu-pin, it makes sense to pass vwfi=native to Xen: it tells the hypervisor not to trap wfi and wfe commands, which are ARM instructions for sleeping. If no other vcpus can ever be scheduled on a given physical cpu, then we might as well let the guest put the cpu to sleep. Passing vwfi=native, the results are:
AVG MIN MAX WARM_MAX
1850 1680 2650 1950
With this configuration, the latency is only 2 microseconds, which is extremely close to the hardware limits, and should be small enough for the vast majority of use cases. vwfi was introduced recently, but it has been backported to all the Xen stable trees.
In addition to vcpu pinning and vwfi, the third key parameter to reduce interrupt latency is unexpectedly simple: the DEBUG kconfig option in Xen. DEBUG is enabled by default in all cases except for releases. It adds many useful debug messages and checks, at the cost of increased latency. Make sure to disable it in production and when doing measurements.