Yoshi Tamura from the NTT Cyber Space Laboratories in Japan gave a very interesting presentation on Kemari. Kemari is a new approach to cluster systems that synchronize VMs for fault tolerance without modifying neither hardware nor applications.
Since virtualization puts an abstract layer between hardware and operating system, it also offers the possibility to migrate virtual machines between physical hosts – but that’s no magic these days. When it comes to high availability and virtual machines today, the approaches mostly consist of using the image-file on a shared data-storage and having multiple copies of configuration files on the potential physical hosts. If one of the hosts breaks, another takes over and automatically fires up the configuration file to boot the virtual machine.
Kemari goes a step further. The word Kemari comes from a traditional Japanese football game where you are supposed not to drop the ball. In the context of virtualization it means: Don’t drop the virtual machine!
But how do you make sure the virtual machine is continuing to run with no noticeable downtime? The most obvious technology for this is synchronization. One could pause the VM, copy a snapshot to a secondary server and unpause the VM on the primary server once the transfer has been ended successfully. The snapshot for this moment is then available to the secondary node. But what about storage, network and console events that occur in between the snapshots of the VM?
There are two ways to realize this. The first one is the so called lock-step approach which logs the external events on the primary VM and transfers them to the secondary VM where they need to be replayed. The disadvantage of this is the complicated implementation if you are trying to synchronize two VMs between different processor families – there might be events that cannot be replayed easily. The second approach uses continuous checkpoints (REMUS project at UBC). Here the outputs on the primary VM are buffered and delayed until the secondary VM is being updated.
Kemari is using a hybrid approach. It traps events sent by the frontend driver of the primary VM through the Xen event channel and sends them to the secondary VM. By snapshoting the VM before the event is sent out to hardware, there are no inconsistencies in case the secondary VM has to take over.
Yoshi also brought a demonstration video with him. Here he showed how a xclock kept running although the primary physical server was shutdown through the HP iLO board. There was only a little pause (<1 sec) but the VNC session and the clock kept running on the secondary server. His presentation can be found here and an abstract of the Kemari architecture is located here.
The ambitious work and open discussion of these completely new and promising approaches (both Kemari and Remus) will make Xen and virtualization in general more and more popular. Once this is included in the core Xen hypervisor, high availability comes to the masses: high availability 2.0.