Xen 4.2 will contain two new scheduling parameters for the credit1 scheduler which significantly increase its confurability and performance for cloud-based workloads:
ratelimit_us. This blog post describes what they do, and how to configure them for best performance.
The timeslice for the credit1 has historically been fixed at 30ms. This is actually a fairly long time — it’s great for computationally-intensive workloads, but not so good for latency-sensitive workloads, particularly ones involving network traffic or audio.
Xen 4.2 introduces the
tslice_ms parameter, which sets the timeslice of the scheduler in milliseconds. This can be set either using the Xen command-line option,
sched_credit_tslice_ms, or by using the new scheduling parameter interface to
# xl sched-credit -t [n]
Interesting values you might give try are 10ms, 5ms, and 1ms. One millisecond might be a good choice for particularly latency-sensitive workloads; but beware that reducing the timeslice also increases the overhead from context switching and reduces the effectiveness of the CPU cache. Values of 5ms or 10ms give a good balance. The default, 30ms, is probably too long; but we’re going to do some more experimentation and probably switch the default in 4.3. If you try any values that turn out to be particularly good or bad, let us know.
Schedule rate limiting
Hui Lv over at Intel did some fascinating work analyzing the performance overhead of running SpecVirt inside of Xen. What he discovered was that under certain circumstances, some VMs were waking up, doing a few microseconds worth of work, and going back to sleep, only to be woken up microseconds later. The credit1 scheduler correctly identified that these were probably latency-sensitive applications and gave them priority to run whenever they needed to. The problem was that they were causing thousands of schedules per second — in some cases up to 15,000 schedules per second. This meant that there was a very significant amount of time actually spent in the scheduler switching back and forth between the two tasks, rather than doing the actual work.
The credit1 scheduler was giving these processes microsecond latency; but there are very few network-based workloads that require sub-millisecond latency. Hui worked with the Xen development community to introduce a simple mechanism that would be predictable and robust, and improve the performance for this workload without degrading the performance of other workloads. The result was
ratelimit_us. The ratelimit is a value in microseconds. It is a minimum amount of time which a VM is allowed to run without being preempted. The default value is 1000 (that is, 1ms). So if a VM starts running, and another VM with higher priority wakes up, if the first VM has run for less than 1ms, it is allowed to continue to run until its 1ms is up; only after that will the higher-priority VM be allowed to run.
One millisecond is not generally too long for network-based workloads to wait; and the effect is to have more “batching”, so the whole system is used more effectively. This caused significant increase in SpecVirt performance in Hui’s tests.
This feature can be disabled by setting the ratelimit to 0. One could imagine, on a computation-heavy workload, setting this to something higher, like 5ms or 10ms; or if you have a particularly latency-sensitive workload, bringing it down to 500us or even 100us.
This value can be set either on the Xen command-line using
sched_ratelimit_us (Note no “credit” in this one — it’s meant to be consumed by other schedulers as well) or the xl command-line:
# xl sched-credit -r [n]
Curent values of both can also be viewed using xl:
# xl sched-credit Cpupool Pool-0: tslice=30ms ratelimit=1000us Name ID Weight Cap Domain-0 0 256 0
Wait, what’s this “cpupool” thing, you say? That’s another new feature in Xen 4.2 — but the topic of another blog post. Watch this space!