The â€œNormalâ€ Case
To explain what is a Stub-Domain (often called stubdom), letâ€™s start with the basics. When you start a new guest with Xen, you would need a Device Model which does some emulation if the guest does not have PV drivers. This is the case for an HVM domain.
This device model (which is QEMU), needs to run somewhere. By default it runs in the Dom0, as root.
Here is a picture:
Now, every time the guest do an IO, like reading a file on the disk, there is an event sends through Xen to the device model. It does the emulation and send the result to the guest.
Furthermore, for every new guest, a device model is start in Dom0. This leads to a couple of issues:
- Firtly, competition between QEMUs and other services in Dom0. Dom0 could run something else instead of QEMU, this will delay the IO request.
- Secondly, if there is a security vulnerability in QEMU, then the attacker could have privileged access to dom0.
Yes, we can start the device model in its own domain.
Here is how it looks:
In this case, QEMU will be running in it’s own, dedicated, domain. when there is an IO request, the device model will take care of it right away because it has nothing else to do. This mean less delay to handle an IO event.
Also in this configuration, the stub-domain gives an extra layer before an attacker can reach dom0. If there is a vulnerability in the device model, the attacker will only have access to the stub-domain, which has the same privileges as the guest.
How to do it?
Right now, there already is a stubdom provide with Xen, but it needs to be updated or changed in order to use the upstream QEMU.
This one is based on Mini-OS as a kernel and work well with the QEMU traditional, but it is limited, and provide only a small subset of the libc, called newlib. (more information about Mini-OS on http://www.cs.uic.edu/~spopuri/minios.html)
To use upstream QEMU in the stubdom, there is two solutions:
- Provide a better libc.
- Use linux as a kernel.
I choose to implement the second solution.
Linux based Stub-Domain
To make the Linux kernel works as a stubdom, we need a few changes. Right now, Linux allows the use of the privileged commands only to dom0, especially the one to do some memory mapping of another domain, but the stubdom also needs to call those functions. We just have to allow a stubdom to call them.
In QEMU, there is also a few changes. One is to set an â€œhvm_paramâ€ to say that the domain will be a device model domain, or stubdom. We also disable the initialization of the PV driver backends, as these are provided by Dom0.
LibXL needs to be taught how to setup this new kind of stubdom because it is quite different from the mini-os one.
And finally, we need to create an initramfs for this stubdom. It will contain QEMU, some needed tools like busybox and XenStore, and an initscript which will setup the network to have a proper redirect between dom0 and the guest, and finally start QEMU with the argument passed by XenStore.
Right now, we can start a domain with both serial console and network, with the stubdom memory of about 30MiB.
But this work is not yet well integrated to LibXL and to the build system. So to access the serial console of the guest, we need to look into the xenstore database and then open the tty with the serial console client of your choice (minicom, screen), `xl console` does not work.
Another things that does not work at all is the video output, this will be deat with later.
The Linux based stub-domain is work in progress, but shows good promises as you will see right now.
Lies, damned lies, and benchmarks
Now, letâ€™s see how this Linux based stubdom does performs compared to QEMU in dom0 and compared to the Mini-OS based stubdom. For this purpose, we will use a machine with:
- 8 CPU AMD Opteron,
- 8 GB of RAM,
- Arch Linux 64 bit as dom0 (Linux 3.4.8),
- Arch Linux 64 bit for domU (Linux 3.4.8).
For the purpose of this benchmarked, only the emulated device are used, meaning that no PV driver are benchmarks.
The network benchmark is done using iperf, and the guest have an emulated network interface controler e1000. The result is the average of 3 min, measuring the traffic one way at a time; bigger is better.
We are comparing the Linux stubdom with the standalone version, with QEMU upstream and with QEMU traditional.
So, the result is not great, the reception gets better when using a Linux stubdom, but the transmission is worse using Linux stubdom. This will need to be investigated in the future.
The disk benchmark is done using dd. The guest disk is on a LVM volume, and the guest is using the emulated IDE disk from QEMU.
Here, we are comparing both QEMU versions, both stubdoms and native.
So, first, Mini-OS stubdom is already better than his non-stubdom version, QEMU traditional in dom0. Then, between QEMU upstream and the Linux stubdom there is not much difference, but they are both equivalent to native.
For the last benchmark, we are measuring the time taken by a guest to boot. This is done by trying to ssh to the guest after the guest is created by `xl create`. The result is the average of 10 boots, and it is in seconds. The less time it takes to start, the better.
So, here, Linux stubdom is better by 1 second over QEMU upstream, and by 3 seconds over QEMU traditional.
With a stub-domain, we remove some problem that can be encountered with a device model in dom0 which are competition or priority inversion between processes or even other device models. Stub-domains add an extra layer against security vulnerability that could be found in QEMU.
Overall, stubdoms have a good impact on the performance, even when itâ€™s running a heavier kernel, Linux, than the Mini-OS kernel. When this Linux based stubdom will be ready and well tested, it could be run as default.
You can find a git tree with the code: git://xenbits.xen.org/people/aperard/xen-unstable.git, then select the branch “stubdom-preview1”.
The work is in the directory “stubdom-linux”. To build: makeÂ DESTDIR=$an_install_dir -C stubdom-linux install
Then, to run the stubdom, add this to your xl configuration file: device_model_linux_stubdomain_override = 1