Xen on ARM interrupt latency

Xen on ARM is becoming more and more widespread in embedded environments. In these contexts, Xen is employed as a single solution to partition the system into multiple domains, fully isolated from each other, and with different levels of trust.
Every embedded scenario is different, but many require real-time guarantees. It comes down to interrupt latency: the hypervisor has to be able to deliver interrupts to virtual machines within a very small timeframe. The maximum tolerable lag changes on a case by case basis, but it should be in the realm of nanoseconds and microseconds, not milliseconds.
Xen on ARM meets these requirements in a few different ways. Firstly, Xen comes with a flexible scheduler architecture. Xen includes a set of virtual machine schedulers, including RTDS, a soft real-time scheduler, and ARINC653, a hard real-time scheduler. Users can pick the ones that perform best for their use-cases. However, if they really care about latency, the best option is to have no schedulers at all and use a static assignment for virtual cpus to physical cpus instead. There are no automatic ways to do that today, but it is quite easy to achieve with the vcpu-pin command:

Usage: xl vcpu-pin [domain-id] [vcpu-id] [pcpu-id]

For example, in a system with four physical cpus and two domains with two vcpus each, a user can get a static configuration with the following commands:

xl vcpu-pin 0 0 0
xl vcpu-pin 0 1 1
xl vcpu-pin 1 0 2
xl vcpu-pin 1 1 3

As a result, all vcpus are pinned to different physical cpus. In such a static configuration, the latency overhead introduced by Xen is minimal. Xen always configures interrupts to target the cpu that is running the virtual cpu that should receive the interrupt. Thus, the overhead is down to just the time that it takes to execute the code in Xen to handle the physical interrupt and inject the corresponding virtual interrupt to the vcpu.
For my measurements, I used a Xilinx Zynq Ultrascale+ MPSoC, an excellent board with four Cortex A53 cores and a GICv2 interrupt controller. I installed Xen 4.9 unstable (changeset 55a04feaa1f8ab6ef7d723fbb1d39c6b96ad184a) and Linux 4.6.0 as Dom0. I ran tbm as a guest, which is a tiny baremetal application that programs timer events in the future, then, after receiving them, checks the current time again to measure the latency. tbm uses the virtual timer for measurements, however, the virtual timer interrupt is handled differently compared to any other interrupts in Xen. Thus, to make the results more generally applicable, I modified tbm to use the physical timer interrupt instead. I also modified Xen to forward physical timer interrupts to guests.
Keeping in mind that the native interrupt latency is about 300ns on this board, these are the results on Xen in nanoseconds:

AVG  MIN  MAX  WARM_MAX
4850 4810 7030 4980

AVG is the average latency, MIN is the minimum, MAX is the maximum and WARM_MAX is the maximum latency observed after discarding the first few interrupts to warm the caches. For real-time considerations, the number to keep in mind is WARM_MAX, which is 5000ns (when using static vcpu assignments).
This excellent result is small enough for most use cases, including piloting a flying drone. However, it can be further improved by using the new vwfi Xen command line option. Specifically, when vcpus are statically assigned to physical cpus using vcpu-pin, it makes sense to pass vwfi=native to Xen: it tells the hypervisor not to trap wfi and wfe commands, which are ARM instructions for sleeping. If no other vcpus can ever be scheduled on a given physical cpu, then we might as well let the guest put the cpu to sleep. Passing vwfi=native, the results are:

AVG  MIN  MAX  WARM_MAX
1850 1680 2650 1950

With this configuration, the latency is only 2 microseconds, which is extremely close to the hardware limits, and should be small enough for the vast majority of use cases. vwfi was introduced recently, but it has been backported to all the Xen stable trees.
In addition to vcpu pinning and vwfi, the third key parameter to reduce interrupt latency is unexpectedly simple: the DEBUG kconfig option in Xen. DEBUG is enabled by default in all cases except for releases. It adds many useful debug messages and checks, at the cost of increased latency. Make sure to disable it in production and when doing measurements.

Read more

Xen Project Announces Performance and Security Advancements with Release of 4.19
08/05/2024

New release marks significant enhancements in performance, security, and versatility across various architectures.  SAN FRANCISCO – July 31st, 2024 – The Xen Project, an open source project under the Linux Foundation, is proud to announce the release of Xen Project 4.19. This release marks a significant milestone in enhancing performance, security,

Upcoming Closure of Xen Project Colo Facility
07/10/2024

Dear Xen Community, We regret to inform you that the Xen Project is currently experiencing unexpected changes due to the sudden shutdown of our colocated (colo) data center facility by Synoptek. This incident is beyond our control and will impact the continuity of OSSTest (the gating Xen Project CI loop)

Xen Summit Talks Now Live on YouTube!
06/18/2024

Hello Xen Community! We have some thrilling news to share with you all. The highly anticipated talks from this year’s Xen Summit are now live on YouTube! Whether you attended the summit in person or couldn’t make it this time, you can now access all the insightful presentations

Get ready for Xen Summit 2024!
05/24/2024

With less than 2 weeks to go, are you ready? The Xen Project is gearing up for a summit full of discussions, collaboration and innovation. If you haven’t already done so – get involved by submitting a design session topic. Don’t worry if you can’t attend in person,