March 25, 2009

Testing the Lineup Analysis Tool

An email from a reader prompted me to test how accurately the Lineup Analysis Tool predicts runs scored. It was built on data from 1998-2002, so I decided to see how well it actually predicted team seasons from 2003-2008. Since the Day by Day Database contains statistics by lineup position, it just took writing a query to perform the Lineup Analysis calculation on the actual OBA and Slugging Percentages by slot for all 180 team seasons. The difference Predicted – Actual runs per game yielded the following statistics:

  • Mean: 0.0585 runs/game, nine runs over a season.
  • Standard deviation: 0.145 runs/game, about 23.5 runs over a full season. Two thirds of the observations should fall between -14 and 33 runs over a full season.
  • Minimum difference: -0.32 runs per game.
  • Maximum difference: +0.5 runs per game.

The tool underestimated the 2008 Minnesota Twins by 0.32 runs, while it over estimated the 2005 Arizona Diamondbacks by 0.5 runs, the two biggest outliers. There were six teams the tool predicted with zero difference to two decimal places.

The following graph shows how well the tool predicted runs per game (click on the graph for a larger view):

Lineup Tool Test

Lineup Tool Test

So the tool tends to predict too many runs. It’s not perfect, but it is in the ballpark.

Here’s the spreadsheet if you’d like to look at the results in more detail:

Lineup Tool Predictions.

10 thoughts on “Testing the Lineup Analysis Tool

  1. gsw

    That’s not bad at all. The tool doesn’t account for injuries (and the generally lower stats of the replacement players), correct?

    ReplyReply
  2. David Pinto Post author

    Pre-season it doesn’t. However, this analysis is based on what actually happened during the season, so injuries and pinch hitting and lineup changes are indeed included.

    ReplyReply
  3. Jeff

    I tend to predict retail sales by day in a similar method. It also over estimates. Maybe you could use the same -10% economy modifier I’m using, but for runs.

    I’m new here.. how do I use this tool?

    ReplyReply
  4. Bjoern

    An R^2 of .9 is pretty good IMO. I guess fluctuation of real offenses are responsible for the overestimation, the lineup tool is a 100% consistent offense, if you will.
    I would like to see if teams (managers) consistently overperform the averages or if steals etc. play a role.
    If you ever consider giving your data away, please let me know. 😉

    ReplyReply
  5. bsball

    That looks like a pretty good fit to me. Can you say that the tool tends to predict too many runs? It looks to me like the mean is very close to 0, especially relative to the sample variance.

    ReplyReply
  6. David Pinto

    The mean isn’t 0, it’s positive. So yes, it tends to over-predict runs, but just a little. It could be with another 180 team seasons the mean will be closer to zero.

    ReplyReply
  7. bsball

    The sample mean is positive. So, within your sample data the tool tends to be high. The “true” mean could easily be 0 (or even negative). Would the tool tend to be high outside your sample? Based on the statistics you showed we can’t tell whether the true mean is 0 or not. That’s all I was saying.

    ReplyReply
  8. ptodd

    Thats a good fit, especially for the Red Sox. Some other run estimators tend to significantly overstate the Red Sox scoring.

    ReplyReply
  9. phorever

    ummm… isn’t that about the same fit (or even a little worse) as that produced by using just team OPS to predict team run totals? if so, it looks to me like this is saying that lineup doesn’t matter.

    ReplyReply
  10. Pingback: Beaneball

Leave a Reply

Your email address will not be published. Required fields are marked *