Baseball Musings: Royal Calculation

April 14, 2003

Royal Calculation

Jed Roberts points out a flaw in my calculation of the Royals chances of winning eight in a row.

I think your calculation of the probability of there being at least one string of 8 consecutive victories in the Royals' season is not quite right. If I understand how you did it, you compute the probability of there not being a string of 8 wins in a row (1 - p^8), and then take that to the 155th power, there being 155 possible starting points for a string of 8 successes in a season. Your answer is then 1 minus that result. (I.e., 1 - (1-p^8)^155) The problem is that these 155 cases are not independent, so that you can't just multiply the probabilities together. For example, the games 1-8 and the games 2-9 have 7 games in common, so the probability of 8 successes in a row starting at game 2 is not independent of what happened in the 8 games starting at game 1. In particular, if the Royals failed to win 7 games in set 1-8, they have 0 chance of winning all 8 games 2-9. Another way to see that this method isn't right is to consider a much simpler case: what is the chance of seeing a string of 2 wins in a season of 3 games? For p = .5 your method would yield 1 - (1 - 0.5^2)^2 = 7/16. But the correct answer is 3/8, since of the 8 equally likely possible outcomes, only 3 contain strings of 2 wins: WWL, LWW, WWW.

At this point, if I were as smart as Larry, I would proceed to calculate the correct answer. However, being just a dummy, I cannot do this. At least, not without a lot more cogitation. It seems to me that the correct answer involves solving a rather difficult problem in combinatorics.

During my calculations, I made an independence assumption. I assumed that all eight game stretches were independent. As Jed's calculation shows, that's not the case. However, the calculation I make errs in making the probablility too high, meaning I'm less likely to call a streak significant. So I'm being cautious in the direction I want to be.

These independence assumptions are often made in this line of work. In my day job, we research language models for information retrieval, and we are constantly making independence assumptions to make calculations tractable. The one I made for the Royals calculation was a good approximation, but it makes the data look less significant than it is.

Posted by David Pinto at 11:10 AM | TrackBack (0)