I did a poor job of blogging Mitchel Litchman’s talk on Sunday, but there were some points about which I’d like to comment.
The first is that Mitchel stopped penalizing the other fielder when an out is made on a shared play. A shared play is when there is some probability that multiple fielders might catch a ball (think of a fly ball to right-center). Mitchel noted that he’s gone back and forth on this over the years, but finally decided against the penalty. His argument was that the batted ball locations we get from the human scoring systems are not precise, and the reason one fielder makes the play compared to the other is that he was closer to the ball to start.
The Probabilistic Model of Range (PMR) does give the fielder not making the out the penalty, as I believe there is information there. Back in the 1990s, STATS, Inc. published Zone Ratings, and Ken Griffey, Jr. did poorly. That surprised people, because they always saw him on ESPN making these fantastic catches at the wall. John Dewan asked me to look into this, and I found the problem was that Griffey was the opposite of a ball hog. He allowed his left and right fielders to take balls that he should have been catching. Now, it’s possible this was a good strategy; Griffey could play deep and make the catches at the wall while the LF and RF took the short flies. It made him look bad compared to other centerfielders, who tried to catch everything. Ken had the reputation when he was young of not trying on every play, however, so maybe it was Ken just didn’t want to go after balls and let his side fielders take up the slack. Since there weren’t an over abundance of balls dropping in for hits in front of him, I somewhat discount the playing deep theory.
PMR does not try to turn the numbers into runs, like UZR. In a system that assigns run values, I agree that the fielder not making the out should not be penalized. After a few minutes thinking about this, I would penalize the fielder. The run values are built into the model. So when a fielder doesn’t get to a ball, the run values for the outs are built into his model. They’re just not his outs. He doesn’t get charged with a hit, he gets charged with the run expectation for the batted ball, including outs made by other fielders. I think I was right the first time. To paraphrase Dennis Moore, this redistribution of runs in trickier than I thought.
My guess is that there’s not much difference, and that over a large enough sample all these things even out.
Update: See update at the end of the post.
(By the way, I showed there that shared plays account for a pretty small amount of balls in play.)
Mitchel’s point that who catches the ball tells us about the positioning is a fascinating argument, and he takes that a step further with errors. Most systems, like PMR, treat a ball in play as a binary outcome. Either an out was produced or not. It doesn’t matter if it was scored a hit, error, or failed fielder’s choice, the batter was either out or safe. Mitchell notes that if a fielder makes an error on a bin of parameters with a low probability for making an out, he must have been positioned near where the ball was hit. Again, the human marking of where a ball was hit likely has a high variance. If you pick a spot on the field, and look at where reporters mark the ball, you probably get a circle with that point as the center. Since errors tend to happen on balls that should be easy to field, it’s likely the ball was hit closer to the fielder than the probability model indicates. Therefore, Mitchel imposes a bigger penalty on errors.
In other words, the error is telling us that the model we’re using shifted. The problem, of course, is that it doesn’t capture hits that happen when the batter makes a similar shift and is not charged with an error. He gets charged with a low probability event, when he should be charged with a higher one. I might even argue that the fact that the fielder shifted and made the correct decision in the shift should lessen the penalty, since his process was right, even though the result was bad.
What I really take away from this, however, is that it might be possible to build a probabilistic model of positioning. For example, if you look at centerfielders against left and right handed batters, and you see him making more plays on one side of the field based on handedness or pull percentage of the fielders, you can infer that the fielder moves to one side of the field or the other. If you see the distributions look the same, however, he’s probably standing in one place and not moving much.
(PMR accounts for handedness of both the batter and the pitcher, so this should be built into the model.)
Mitchel is trying to tackle the big problem in defense, that range as we measure it is really a combination of the ability to move and the ability to position fielders. Until we get FIELDf/x, we have a very difficult time separating the two.
My one other comment on Mitchel’s talk was that With or Without You (WOWY) was showing much larger run difference between good and bad fielders than UZR. Mitchel threw out a couple of theories why, but I think it just may be a sample size problem. Since the start of the 1996 season, Jeter played 2323 games. That means he played about 94% of the Yankees games in that time. So the sample without Jeter is 150 games, and I don’t know if Mitchel’s data goes back that far. Most fielders who play enough to qualify as bad at the position must do something else good to stay in the lineup, so in general the without you component is small. Mitchel likely took that into account, but that was the first thing that crossed my mind.
Update: Here are my formal thoughts on the way runs should be charged under PMR. Let’s propose a play that looks like this to the team, a fly ball halfway between the default positions for the center and right fielder:
- Out: 25%, -0.07 runs per ball in play
- Single 50%, .235 runs per ball in play
- Double 20%, .154 runs per ball in play
- Triple 5%, .0545 runs per ball in play
So the expected runs per ball in play is .3735, or 37.35 runs per 100 balls in play. In other words, batter should really try to put the ball in play in that spot. We then add up the run value for the actual events on those balls. If the outfielders are good, the team should sum to less than 37.35 runs. If they are poor, more runs.
This is a shared vector, however, and of the 100 balls in play, 15 are caught by the centerfielder, 10 by the rightfielder. Since the centerfielder gets to 60% of the outs, I’ll assume he gets 60% of the run expectation, and rightfielder 40%. So when a ball falls in for a hit, the CF and RF split the run value for that event 60-40.
Now what happens when the ball is caught. The program could do that same thing, and split the run value of the out 60-40, or give all the run value to the fielder who caught the ball and a zero to the fielder who didn’t. If you split, you’re not penalizing the fielder for who didn’t catch the ball. If you go with the full value of the out to the fielder who catches it, you are penalizing the fielder who doesn’t catch the ball.
If the actual data is around a 60-40 split, it doesn’t matter which system you use, because the results will come out the same. It does matter, matter, however, if the CF is catching 80% of those balls. Mitchel would argue that the scoring of the location of the ball is off, or the outfielders are positioned differently, so don’t penalize the RF for something that isn’t his fault. There is the possibility, however, that the CF is making those plays because he’s a much better outfielder than the RF, in which case the penalty should be applied, and the outfielder making the catch should get full credit for the out. This is one of those known unknowns.
Maybe the right way to do this is to give the fielder making the catch a little more credit. I just feel there is information that we are throwing away if we split the credit on a catch proportionally.