March 26, 2005
Defensive Charts
The Easter Bunny arrived a day early for me, and instead of colorful eggs I received colorful charts representing each fielder's probabilistic model of range. For each player, there's a se of graphs with a black line for actual outs, a yellow line for predicted out, a red line for the difference between the two, and blue lines representing the best and worst values for those data points by qualifying players.
I want to thank and congratulate Dave Stasiuk for his hard work and excellent programming skills in creating these web pages. Thanks to to Studes for many mock ups and helpful suggestions. I'm having a great time looking at these graphs, and I hope you do also.
Go here to see a list of players. Then click on the name, and you'll get complete charts for every position played by that fielder, one for each batted ball type. Enjoy!
Please feel free to use the comments to suggest any improvements.
Update (7:00 AM EST March 27, 2005): David Stasiuk explains the blue lines:
It's +/- 1 standard deviation from the average AO/BIP for each vector by position by BIP type, as weighted by the number of balls in play for each player...so, for example, a shortstop who played one game, and had one ball hit to him and made it for 1.0 AO/BIP would only have one record at 1.0 in calculating the standard deviation, whereas a shortstop who played every game and had 0.93 AO/BIP in 200 BIP to the same vector over the course of the season would have 200 records at 0.93 in calculating the standard deviation.
This gave me a solid +/- for every position for every vector...it isn't a min/max so much as it is a standard range of performance. Basically, if a regular player is either over or under that range endpoint, you know that they're either really good or really bad for that particular BIP type and vector.
Baseball Musings is holding a pledge drive during March. Click here for details.
Posted by David Pinto at
09:56 PM
|
Defense
|
TrackBack (1)
Couple questions: 1) How is the predicted out arrived at? In other words, in the fairly random sport of baseball, how can you accurately predict how many outs per vector a player will get? Corrispondingly, wouldn't a line showing actual vs. league average be more helpful? And 2) How can a player have a datapoint higher (or lower) than the numbers for best and worst. I understand it might be a qualifying issue, but how can a player who starts almost everyday have a qualifying issue?
Patrick.
Amazing David. Excellent work!
Are you going to sell player sponsorships?
Hey, good work! Out charts for pitchers are what I wanted to see, to be able to compare two good hitting pitchers like Carlos Zambrano and Jason Jennings.
Very, very interesting and provocative work.
One thing I think really needs to be on these graphs is the sample size. A simple n=# in one corner would be fine. I know the SD bars are giving us the necessary information about your confidence intervals, but the raw number would be helpful when discussing the charts with laymen.
Great work guys. Thanks a ton.
Am I correct in assuming that the graph moves from left to right in the same manner that the different vectors move for each position? In other words, does the far left AO and PO point correspond with the farthest left vector on the field for that position (ie. for a leftfielder, the leftfield foul line)?
David, I'm a bit worried about your +/- 1 standard deviation bars. Look at Alexis Rios' page:
http://pages.map.com/pinto/charts/20902004.htm
In one case, the error bars don't appear to include the datum point. This is unlikely to be correct.
When the sample size is small (if the expected number of outs at each point in the vector is less than about 10, and possibly when it's less than 30) the normal approximation to the binomial distribution is inaccurate, and these standard deviation bars are misleading. It would be better to give genuine 90% confidence intervals, or something like that, and not to use the normal approximation.
But my guess is that this would just illustrate that almost no player has enough play data to be statistically significantly different from the mean. i.e. insufficient data here to conclude much. Could you comment on that?
Amazing!
Check out Greg Maddux, compared and contrasted to Kenny Rogers... Tray Glaus vs Scott Rolen... I've said it before, but the amazing thing about these is that the graphs look like the actual performance of the players one sees on the field..
I'd still like to know whether the "predicteds" should be thought of in general as "average performer" or "replacement performer?" Or something else?
This is a terrific resource, even before any tweaking-- just to have a place to go for a general idea if you have cause to wonder about the defense of a guy you've never seen play...
Jay, that is a correct assumption.
Andrew, thanks for the suggestion.
John,
Predicted outs is an average. It's the expected DER on that vector based on the parameters of the ball in play.
Excellent job! Perhaps I'm inferring too much from the info, but when I look at what the charts, is it safe to infer that:
a) a player who exceeds expectations, particularly on line drives (where there is less time to react) benefitted some from positioning?
b) an outfielder that performs under expectations on line drives but average or better on flyballs suffers from diminished range? Let's call it the Reggie Sanders' effect....
Look at Jeter and A-Rod on fly balls down the third base-line. Is there any addjustment for descretionary fly balls?
These are amazing.
Look at Ichiro's numbers. He's pretty good on line drives across the board. On fly balls, however, he's great towards center field, then gets progressively worse as he goes to the right field line (with some wackiness going on in foul territory.)
Now look at Randy Winn. He's average on fly balls hit right to him, but sucks going to either side.
So it seems to me that Ichiro has to position himself closer to centerfield to cover for Winn, which leaves far right field more vulnerable.
Excellent job! Perhaps I'm inferring too much from the info, but when I look at what the charts, is it safe to infer that:
a) a player who exceeds expectations, particularly on line drives (where there is less time to react) benefitted some from positioning?
b) an outfielder that performs under expectations on line drives but average or better on flyballs suffers from diminished range? Let's call it the Reggie Sanders' effect....