March 09, 2005
Charting Range
Over the last few days I've been chatting with Robert Saunders about presenting data graphically. He pointed me to this post on Edward Tufte's web site, where's he trying to present charts that are the size of words. I'm not there yet, but Robert did get me thinking about presenting the Probabilistic Model of Range graphically. I thought I'd give it a try with David Eckstein, since there were some arguments over whether the data properly reflected his abilities.
What I've done is broken the data down by ball in play type (grounders, flys and liners). Each chart below has the direction of the ball on the X-Axis. The Y-Axis represents the probability of turning those balls into outs. Eckstein's actual probability is compared to the predicted probability. For reference, a vector of -4 (minus 4) represents the thirdbase line, and 8 represents straight away centerfield. Here's Eckstein on grounders in 2004 (click on graphs for a larger image):
As you can see, David is great when the ball is hit right at straight away short. But once he starts moving left or right, he becomes a below average fielder. Nothing terrible, just below average.
Now let's look at fly balls.
I really love the information this chart conveys. It shows that fly balls are usually caught by shortstops around the normal position, they go down around third base, but pick up again in foul territory. And this shows why David does so poorly. He does not catch pop ups in foul territory. With the Cardinals, he has a great fielding third baseman in Rolen, so Scott will have to go after balls the shortstop usually gets.
Finally, the line drive chart.
He's just way below average to his left. Even at balls hit right at the position, he doesn't do well. Does he not react quickly?
I'll be doing a few more of these. I hope you find them as informative as I do.
Update: Fixed a left-right problem. I said that Eck was below average to his right. I meant left. Thanks, Studes.
Baseball Musings is holding a pledge drive during March. Click here for details.
Yes! David, this is the most exciting thing I've ever seen on your site -- and I don't say that lightly. This is a great example of how graphs and charts can show information much more easily than sheer numbers can. This is superb. Congrats.
Obviously, you need to work on the format of your graphs to make them more readable. But this is so powerful even as is.
I agree with Studes, these are really simple and powerful visuals.
I have a couple questions though: How do you define straight away short? Is it by the position of the ball relative to the field or to the fielder where he is placed? In other words if the Angels have a shift on and the ball is hit directly at Eckstein as he stands behind 2nd base, does that count as a vector of 8 or 0? Also, you probably covered this before, but where do you get your batted-ball data?
Also, when you say "Eckstein is way below average to his right" don't you mean to his left?
SamW, the vectors are fixed. When I say straight away, I mean the position a short stop would normally play. So 8 is always straight away center.
I must echo Studes. Wow. I want to see these charts for every player out there. Just fascinating stuff. Can you show Jeter? I'm very curious...he's so much ballyhooed by everyone but the statheads, who like to make him their overrated-defense whipping boy...the gut says that Jeter makes the crowd-pleasing, spectacular types of plays well (liners and flies, perhaps?) but has terrible left to right gb range. I'll bet CF is another great position to see with this tool. Kudos David!
This is really, really cool. Aside from some reservation I have about what would be expected given random chance (and thus not drawing to direct conclusions on the margins of error), this gives more information that any single number I have ever seen for a defender, and has the potential to be more accurate too.
OK, before the constructive criticism begins - these pictures are brilliant.
With that out of the way....
There's something amiss with the way you are describing those vectors. If 4 is the thirdbase line and 8 is straightaway center, shouldn't a shortstops expected peak be higher than 4?
I'd by that 0 is the thirdbase line.
Also, if those are Excel graphs, you can clean them up quite a bit by formatting the axis, and changing the "Tick mark labels" from "Next to axis" to "Low"
This is a pain to do every time, of course, so record a "FixGraph" macro.
Personally, I'd normalize the graph to put the center of the expected range at 0, and think really hard about whether the right side of the graph should represent the player going left (ie, showing the range as seen from home plate), or going right (showing the range facing home plate). The picture looks correct to me as you've done it, but it does make talking about it funny.
And really, the picture of the a players defense should show those graphs superimposed on each other, with the expected curve in the background. (certainly, it's easier to talk about as three seperate graphs, though - I just like the bubble gum card picture).
Finally - what is that little bobble in the expected ground ball curve over by the second baseman's position?
Danil,
The third base line is at -4, not 4. I'm looking at the post, and can see the browser is not keeping the minus sign with the 4. I'll try to fix that.
Thanks for your other suggestions!
For the line drives -- could it just be that eckstein isn't very tall? or is this somehow factored in?
either way, those graphs are pretty neat.
David, this is great! What would be even better is if you wrote a Python script that would generate these graphs from the data. You could stick all the data into a MySQL database, and then have a script that would generate the graphs. You could even have it so a user could compare two players and have their graphs show up together.
This is exciting!
Wow, that is awesome. One question though. Is there data for range forward and back? I think that's especially important on pop flies. And, if there is, is there anyway you could make a 3 axis graph?
Great project. The A's and I wish you had our tandem of Chavez, Crosby, Ellis, Ginter, and Scutaro all graphed out!
Just to complicate matters; Ginter @ 3b & 2B, Scoot @ 2B & SS.
This chart is excellent for range. Will there be one for accuracy of throws from the various infield positions?
I guess I'm leading to the fielder that gets to all those spots (1), then accuracy in completing the play with an accurate throw to 1B (2), 2B (3), 3B (4), or home (5)
Fascinating stuff. One of the primary comments as you were charting individual positions was whether a 3B or SS with great range was poaching balls from their neighbor. Since the vectors travel across the whole field, could you produce a graph of an entire infield? This might begin to capture where positioning and/or range was interfering with individual player data.
Also, expounding on the 3D graph idea, would it be possible to plot an entire defense? Let the x-axis, coming out of the screen, be a radial distance from homeplate, then the y-axis, left to right on the screen, be the same left to right vector setup you used here. And then the z-axis would be the probabilty. A graph of this type would make it easy to visualize "holes" in defensive schemes. But I haven't read up enough on the stats used to know if this is really possible.
You should seriously consider publishing a book containing this data for all players. This is cutting edge stuff.
Re the line drives: "Even at balls hit right at the position, he doesn't do well. Does he not react quickly?"
Someone already mentioned this above, but Eck is quit short, and simply can't jump as high as other SS.
The pop flies really show why PMR and UZR have such a different read on Eck. UZR had him at around average for 2004, which your groundball data also conveys; UZR, IIRC, does not consider infield popups.
Question: how were Angel 3B on pop flies? I remember being somewhat surprised that Legs Figgins rated so well by PMR, and I wonder if it was due to him catching more popups than most 3B.
Also, Eckstein rated just fine in 2003 by PMR. Are the popups that much different?
I wonder if Eckstein's 2004 numbers might be suffering from him playing next to a 3B that was also a CF, and possibly better than the average 3B at getting to flyballs. If Figgins makes a play, Eckstein can't.
Count me in the camp that thinks these graphs are the coolest thing I can remember see online. And the potential is really exciting on top of that.
I'd like to second the motion of putting an entire infield defense on a graph like this to see just who field which types of balls. Different infield makeups would be really interesting - an infield with Ozzie and three scrubs would probably look a lot different than an infield with average defenders around the infield.
Please keep these coming! I think I'll donate to the pledge drive just based on this alone.
Come on people! Start donating! I'm sure that this is just a small sample of what David has in store for all fo us if we help him work on this site full time. Just imagine a whole site of interactive graphs like this where you can compare players or check out whole teams. He can do it WITH YOUR HELP! So donate already! If you've donate, DONATE AGAIN! Let's get this guy self-employed!
These graphs are excellent, but I found myself wondering about the relative frequencies of the different kinds of BIP. Obviously there will be more grounders in the 4 direction (i.e. straight at the SS) than foul flies in the -6 direction. It would be great if you could find some way to include that BIP distribution info in the graphs somehow.
It seems odd that Eckstein wouldn't catch ANY pop-ups on the third-base side. Was it an organizational philosophy that those balls should be ignored by the SS? Do the Anaheim 3Bs and LFs come out looking good because they get to catch all those balls? Should we penalize Eckstein for not making plays that somebody else made? (Or does your system ignore those already?)
Looking at the 3B and LF data, Chone Figgins comes in #3 (80 GS at 3rd) and Jose Guillen (135 GS in LF) is 17th among qualifiers, but there are only 4 better OFs with as many BIP as Guillen. My guess is that both of these guys greatly benefitted from catching ball that "should" habe been Eckstein's.
Just fantastic stuff. I've never been completely sold on the PMR (just my own person hangup), but these graphs are probably the most compelling defensive analysis I've ever seen.
One glance and you can start to understand the abilities and liabilities of a defender.
David-- Had I been writing a scouting assessment of Eckstein at the end of last season it would have said "Has lost the half step he couldn't afford to lose. Range now worse than poor. Never was tall enough, no good on pop fouls, not tall enough or quick enough toward second on liners. Must have .380 OBP to be any use at all, and nowhere close to that any more... "
So needless to say I think this is brilliant-- its much easier to tell people where they can go to see it than it is to describe it to them... I think a library of these could be used to teach people how to observe what they are seeing...
And like others, I'd love to see one on Jeter-- its so hard to really see why his numbers suck so badly... I'm definitely seeing not observing in his case...
And I think there's enough here to chew on for quite a while-- while i sympathise with those who want to see whole defenses, extra dimensions, throwing skills etc; I think they're being a bit greedy... I'm not sure this solves the problem of evaluating defense-- but I think it provides a representation of reality from which such a solution can be derived-- that is, we can look at two folks with similar raw numbers and see why those numbers ARE raw-- why they don't describe similar players. If we had this for Rogers Hornsby for example, we would be able to see why he was thought a defensive liability despite numbers which don't look bad... (supposedly he was fine on grounders but awful on balls in the air IIRC.)
David I am myself unemployed and faced with balooning credit card balances-- its irresponsible to my mortgagee for me to eat, let alone donate money to others; but I promise when I win the lottery I'll fund you for a year to write your first (?) book? This is impressive stuff.
This is indeed great. I would only echo Danil's suggestions. It seems to me that it would be much better to make straightaway shortstop 0, since it's easier to visualize the fact that balls hit directly to you ought to be associated with higher probabilities of outs.
Alternatively, perhaps you can simply put a line at -4 and 8 to signify the third-base line and straightaway center. If you want to get creative, you might even color the foul territory side of the graph differently.
Never thought something so simple-looking could be so cool! Just to re-iterate what fret mentioned, I think the charts would be even more useful if overlayed with a bar chart of number of BIP in each of the different directions. And it'd be really cool to fill in the rest of the infield defense :)
Jason
Just a quick question that may or may not be stupid....From where did you get your data set? How do you even go about starting something like this?
This is really nice! A few questions/comments:
First, the fly ball chart is a little funky. How far back into the OF does that thing go so that the average SS only has a 27% chance of catching a fly ball?
Second, it would be great if we could see the variance (error bars) on each point. For example, I doubt that his statement that Eckstein is "below average" at grounders is supported by the small differences seen. He seems exactly average to me.
Finally, this throws a little cold water on the hypothesis that Cardinal pitchers will suffer due to a loss of defense up the middle. With Eckstein doing just fine at grounders and with such an extreme GB/FB ratio in the St. Louis pitching staff, they shouldn't be hurt as badly as some have been suggesting.
We have to change the old saw to "A picture's worth a 1000 spreadsheets". Terrific stuff.
I echo the sentiment about "normalizing", that is, to show "difference from average". Then, plot all four infielders on the same graph (as well as a team sum). Then you could tell right away whether a shortstop is losing outs to the 3rd baseman or just yielding hits into the gap.
And heck, use a polar chart while you're at it.
It looks like Jim beat me to my comment, though I would describe it a little bit differently: I'd like to see what these pictures look like at a team level.
In particular, I'd like to see a picture that demonstrates whether or not infield popups are really discretionary plays.
I agree with everyone else that says these are great graphs. They tell a story, and are easy to understand.
Just some suggestions. My preference with Excel graphs is to change the grey background to white, and change those y-lines (the horizontal black lines) to dashed grey ones, to put them in the background. They take up too much importance when they are solid black.
Great work.
Would you post Vizquel by comparison. He's a 9x gold glover that a lot of people disparage. Some people think he's the second coming of Ozzie Smith and others think he's the second coming of Ozzie Canseco.
From your work and my observations, I'd guess Vizquel is a little below average on line drives (short), well above average on pop flies, somewhat above average to his left (good range/hands) and somewhat below average to his right (weak arm).