Announcements

Release management: Risk, intuition, and freeze exceptions

George Dunlap

Jun 19, 2014 — 6 min read

I’ve been release coordinator for Xen’s 4.3 and 4.4 releases. For the 4.5 release, I’ve handed this role off to Konrad Wilk, from Oracle. In this blog, I try to capture some of my thoughts and experience about one aspect of release management: deciding what patches to accept during a freeze.
I have three goals when doing release management:

A bug-free release
An awesome release
An on-time release

One of the most time-consuming seasons of being a release manager is during any kind of freeze. You can read in detail about our release process elsewhere; I’ll just summarize it here. During normal development, any patch which has the approval of the relevant maintainer can be accepted. As the release approaches, however, we want to start being more and more conservative in what patches we accept.
Obviously, no patch would ever be considered for acceptance which didn’t improve Xen in some way, making it more awesome. However, it’s a fact of software that any change, no matter how simple or obvious, may introduce a bug. If this bug is discovered before the release, it may delay the release, making it not on-time; or it may not be found until after the release, making the release not bug-free. The job of helping decide whether to take a patch or not falls to the release coordinator.
But how do you actually make decisions? There are two general things to say about this.

Risk

The only simple rule to follow that will make sure that there are no new bugs introduced is to do no development at all. Since we must do development, we have to learn to deal with risk.
Making decisions about accepting or rejecting patches as release coordinator is about making calculated risks: look at the benefits, look at the potential costs, look at the probabilities, and try to balance them the best you can.
Part of making calculated risks is accepting that sometimes your gamble won’t pay off. You may approve a patch to go in, and it will then turn out to have a bug in it which delays the release. This is not necessarily a failure: if you can look back at your decision and say with honesty, “That was the best decision I could have made given what I knew at the time”, then your choice was the right one.
The extreme example of this kind of thinking that of a poker player: a poker player may make a bet that she knows she only has a 1 in 4 chance of winning, if the pay-off is more than 4 to 1; say, 5 to 1. Even though she loses 75% of the time, the 25% that she does win will pay for the losses. And when she makes the bet and loses (as she will 75% of the time), she knows she didn’t make a mistake; taking risks is just a part of the game.
Obviously as release coordinator, the costs of a bug are generally higher than the benefits of a feature. But the general principle — of taking calculated risks, and accepting occasional failure as the inevitable consequence of doing so — applies.

Intuition

But how do we actually calculate the risks? A poker player frequently deals with known quantities: she can calculate that there is exactly a 25% chance of winning, exactly a 5x monetary pay-off, and do the math; the release coordinator’s decisions are not so quantifiable.
This is where intuition comes in. While there are a handful of metrics that can be applied to patches (e.g., the number of lines in the patch), for the most part the risk and benefit are not very quantifiable at all: expert judgement / intuition is the only thing we have.
Now, research has shown that the intuition of experts can, under the right circumstances, be very good. Intuition can quickly analyze hundreds of independent factors, and compare against thousands of instances, and give you a result which is often very close to the mark.
However, research has also shown that in other circumstances, expert intuition is worse than random guessing, and far worse than a simple algorithm. (For a reference, see the books listed at the bottom.)
One of the biggest ways intuition goes wrong is by only looking at part of the picture. It is very natural for programmers, when looking at a patch, to consider only the benefits. The programmer’s intuition then accurately gives them a good sense of the advantage of taking the patch; but doesn’t warn them about the risk because they haven’t thought about it. Since they have a positive feeling, then they may end up taking a patch even when it’s actually too risky.
The key then is to make sure that your intuition considers the risks properly, as well as the benefits. To help myself with this, during the 4.4 code freeze I developed a sort of checklist of things to think about and consider. They are as follows:

What is the benefit of this patch?
What is the probability this patch has a bug?
If this patch had a bug, what kind of bug might it be?
If this patch had a bug, what is the probability we would find it
before the release?

When considering the probability of a bug, I look at two things:

The complexity of the patch
My confidence in my / the other reviewers’ judgement.

Sometimes you’re looking at code you’re very familiar with or is straightforward; sometimes you’re looking at code that is very complicated or you’re not that familiar with. If the patch looks good and it’s code you’re familiar with, it’s probably fine. If the patch looks good but it’s code you’re not familiar with, there’s a risk that your judgement may be off.
When trying to think of what kind of bug it might be, I look at the code that it’s modifying, and consider things on a spectrum:

Error / tool crash on unexpected trusted input; or normal input to library-only commands
Error / tool crash on normal input, secondary commands / new functionality
Error / tool crash on normal input, core commands
Performance
Guest crash / hang
Host crash / hang
Security vulnerability
Data loss

Usually you can tell right away where in the list a bug might be. Modifying xenpm or xenctx? 3 max. Modifying the scheduler? Probably #6. Modifying hypercalls available to guests? #7. And so on.
When asking whether we’d find the bug before the release, consider the kind of testing the codepath is likely to get. Is it tested in osstest? In XenRT? Or is it in a corner case that few people really use?
After thinking through those four questions, and going over the criteria in detail, then my intuition is probably about as well-formed as it’s going to get. Now I ask the fifth question: given the risks, is it worth it to accept this patch?
After giving it it some thought, I went with my best guess. Sometimes I’m just not sure; in which case go away and do something else for a couple of hours, then come back to it (going over again the four questions to make sure they’re fresh in my mind). The first few dozen times this took a very long time; as I gained experience, judgements came faster (although many were still painfully slow).
In some cases, I just didn’t have enough knowledge of the code to make the judgement myself; this happened once or twice with the ARM code in the 4.4 release. In that case, my goal was to try to make sure that those who did have the relevant knowledge were making sound decisions: thinking about both the benefits and the risks and weighing them appropriately.
For those who want to look further into risk and intuition, several books have had a pretty big influence on my thinking in this area. Probably the best one, but also the hardest (most dense) one, is Thinking, Fast and Slow, by Daniel Kahneman. It’s very-well written and accessible, but just contains a huge amount of information that is different to the way you normally think. Not a light read. Another one I would recommend is The Black Swan, by Nassim Nicholas Taleb. And finally, Blink, by Malcolm Gladwell.

Risk

Intuition

Read more