The Graphics Processing Unit (GPU) has become a fundamental building block in todayâ€™s computing environment, accelerating tasks from entertainment applications (gaming, video playback, etc.) to general purpose windowing (Windows* Aero*, Compiz Fusion, etc.) and high performance computing (medical image processing, weather forecast, computer aided designs, etc.).
Today, we see a trend toward moving GPU-accelerated tasks to virtual machines (VMs). Desktop virtualization simplifies the IT management infrastructure by moving a worker’s desktop to the VM. In the meantime, there is also demand for buying GPU computing resources from the cloud. Efficient GPU virtualization is required to address the increasing demands.
Enterprise applications (mail, browser, office, etc.) usually demand a moderate level of GPU acceleration capability. When they are moved to a virtual desktop, our integrated GPU can easily accommodate the acceleration requirements of multiple instances.
Letâ€™s first look at the architecture ofÂ Intel Processor Graphics:
TheÂ render engineÂ represents the GPU acceleration capabilities with fixed pipelines and execution units, which are used through GPU commands queued in command buffers. TheÂ display engineÂ routes data from graphics memory to external monitors, and contains states of display attributes (resolution, color depth, etc.). TheÂ global state represents all the remaining functionality, including initialization, power control, etc.Â Graphics memoryÂ holds the data, used by the render engine and display engine.
TheÂ Intel Processor GraphicsÂ uses system memory as the graphics memory, through the graphics translation table (GTT). A single 2GBÂ global virtual memoryÂ (GVM) space is available to all GPU components through theÂ global GTT(GGTT). In the meantime, multipleÂ per-process virtual memoryÂ (PPVM) spaces are created through theÂ per-process GTTsÂ (PPGTTs), extending the limited GVM resource and enforcing process isolation.
Graphics Virtualization Technologies
Several technologies achieve graphics virtualization, as illustrated in the image below, with more hardware acceleration toward the right.
Device emulation is mainly used in server virtualization, with emulation of an old VGA display card. Qemu is the most widely used vehicle. Full emulation of a GPU is almost impossible, because of complexity and extremely poor performance.
API forwarding implements a frontend/backend driver pair. The frontend driver forwards high-level DirectX/OpenGL API calls from the VM to the backend driver in the host through an optimized inter-VM channel. Multiple backend drivers behave like normal 3D applications in the host, so a single GPU can be multiplexed to accelerate multiple VMs. However, the difference between the VM and host graphics stacks easily leads to reduced performance or compatibility issues. Because it is hardware-agnostic, this is the most widely used technology, so far. Actual implementations vary, depending on the level where forwarding happens. For example, VMGL directly forwards GL commands, while VMware vGPU presents itself as a virtual device, with high-level DirectX calls translated to its private SVGA3D protocol. Another recent example is Virgil, with its experimental virtual 3D support for QEMU.
Direct pass-through, based on VT-d, assigns the whole GPU exclusively to a single VM. When achieving the best performance, it sacrifices the sharing capability.
Mediated pass-through extends direct pass-through, using a software approach. Every VM is allowed to access partial device resources without hypervisor intervention, while privileged operations are mediated through a software layer. It sustains the performance of direct pass-through, while still provides the sharing capability. XenGT adopts this technology.Â
XenGT is a full GPU virtualization solution with mediated pass-through, on Intel Processor Graphics. A virtual GPU instance is maintained for each VM, with part of performance critical resources directly assigned. The capability of running native graphics driver inside a VM, without hypervisor intervention in performance critical paths, achieves a good balance among performance, feature and sharing capability.
Above figure shows the overall XenGT architecture. Each VM is allowed to access a partial performance critical resource without hypervisor intervention. Privileged operations are trapped by Xen and forwarded to the mediator for emulation. The mediator emulates a virtual GPU instance for each VM. Context switches are conducted by the mediator when switching the GPU between VMs. XenGT implements the mediator in dom0. This avoids adding complex device knowledge to Xen, and also permits a more flexible release model. In the meantime, we want to have a unified architecture to mediate all the VMs, including dom0, itself. So, the mediator is implemented as a separate module from dom0â€™s graphics driver. It brings a new challenge, that Xen must selectively trap the accesses from dom0â€™s driver while granting permission to the mediator. We call it a â€œde-privilegedâ€ dom0 mode.
Performance critical resources are passed through to a VM:
- Part of the global virtual memory space
- VMâ€™s own per-process virtual memory spaces
- VMâ€™s own allocated command buffers (actually in graphics memory)
This minimizes hypervisor intervention in the critical rendering path. Even when a VM is not scheduled to use the render engine, that VM can continuously queue commands in parallel.
Other operations are privileged, and must be trapped and emulated by the mediator, including:
- PCI configuration registers
- GTT tables
- Submission of queued GPU commands
The mediator maintains the virtual GPU instance based on the traps mentioned above, and schedules use of the render engine among VMs to ensure secure sharing of the single physical GPU.
The latest source codes and the setup guide are available at the github repositories:
(The first repository has aÂ XenGT_Setup_Guide.pdf, which supplies step-by-step instructions for getting a system set up.)
Patches are welcomed!
We plan to upstream this work, and are now preparing some cleanup.
XenGT was first announced in Sep 2013:
It was presented at the 2013 Xen Project Developer Summit, Edinburgh:
An update was announced recently in Feb 2014: