Linux PV Guest Performance Improvements

The Xen community achieved a major milestone last summer when all the necessary components for Xen dom0 support made it into the upstream kernel for the 3.0 release. However, during that process developers were focused on functionality, and not on performance. As a result a handful of performance regressions were introduced in pv-ops kernels compared to the classic kernels..

Recently I have started looking at performance aspects of pv-ops Linux by using xentrace and xenalyze (see George Dunlap’s presentation for an introduction) to compare the number and pattern of hypercalls between a classic 2.6.32 kernel and a 3.3 pvops one. I found a number of performance regressions which, luckily, are easily fixed or have minimal impact. The individual fixes are listed below.

xen_version hypercall spam

Xen guests check for pending events (interrupts) raised by the hypervisor when leaving a hypercall.  If local event delivery is disable, the guest must do a dummy hypercall when reenabling events to check for any it may have missed.  The xen_version hypercall is used for this as it has no side-effects.

The pvops kernel was checking for pending events much more often than necessary.

Fix: xen: correctly check for pending events when restoring irq flags which is available in 3.4-rc2.
Result: About 10% performance improvement to a wide range of kernel operations.

Unnecessary TLS descriptor updates during task switch

On the x86 architecture, the location of a thread’s thread local storage (TLS) are stored in three entries in the global descriptor table (GDT).  Under Xen, the GDT is managed by the hypervisor and guests use three hypercalls (bundled in a multicall) to update the descriptors on every context switch.  Very often the descriptors don’t need to change so the kernel can avoid many of these hypercalls by tracking what the old value was.

Fix: xen/x86: avoid updating TLS descriptors if they haven’t changed which is available in 3.6-rc1.
Result: About 9% improvement in context switch time.

PTE updates for page faults were emulated (32-bit guests only)

When a userspace process accesses memory that is not currently mapped into the process’s address space a page fault (e.g., is a file has been newly mmap(), or the process has been swapped out), an exception occurs.  The page fault handler then updates the page table entry (PTE) for that page to make the it accessible to the process.  Under Xen, this update was done by writing directly to the read-only page table.  Xen would trap and emulate this memory write.  PTEs are 64-bits wide so with a 32-bit guest, the two 32-bit wide writes result in two traps into Xen.  By using a single hypercall we half the number of entries into Xen.

Fix: xen/mm: do direct hypercall in xen_set_pte() if batching is unavailable which is available in 3.6-rc1.
Result: About 25% improvement in page fault speed.

Two traps per page when unmapping pages in munmap() (32-bit guests only)

When unmapping pages from a userspace process (such as with munmap()), the corresponding PTE must be cleared and the current dirty and accessed bits must be saved.  The kernel’s ptep_get_and_clear() function does both of these atomically.  In a 32-bit pv-ops kernel this is implemented with two 32-bit accesses (an xchg and a store), classic kernels do one 64-bit access (a cmpxchg8b).

Unfortunately, there isn’t a way to fix this in Xen-specific code.   Profiling suggests that very little time (< 1%) is spent doing munmap() in a running system so improving its performance is going have very little real-word benefit.

Fix: None planned.

Read more

Xen Project Announces Performance and Security Advancements with Release of 4.19
08/05/2024

New release marks significant enhancements in performance, security, and versatility across various architectures.  SAN FRANCISCO – July 31st, 2024 – The Xen Project, an open source project under the Linux Foundation, is proud to announce the release of Xen Project 4.19. This release marks a significant milestone in enhancing performance, security,

Upcoming Closure of Xen Project Colo Facility
07/10/2024

Dear Xen Community, We regret to inform you that the Xen Project is currently experiencing unexpected changes due to the sudden shutdown of our colocated (colo) data center facility by Synoptek. This incident is beyond our control and will impact the continuity of OSSTest (the gating Xen Project CI loop)

Xen Summit Talks Now Live on YouTube!
06/18/2024

Hello Xen Community! We have some thrilling news to share with you all. The highly anticipated talks from this year’s Xen Summit are now live on YouTube! Whether you attended the summit in person or couldn’t make it this time, you can now access all the insightful presentations

Get ready for Xen Summit 2024!
05/24/2024

With less than 2 weeks to go, are you ready? The Xen Project is gearing up for a summit full of discussions, collaboration and innovation. If you haven’t already done so – get involved by submitting a design session topic. Don’t worry if you can’t attend in person,