Baseball Musings
Baseball Musings
November 17, 2006
Probabilistic Model of Range, 2006

The other day I published the first of the Probabilistic Model of Range tables, looking at overall team play. However, since doing that I noticed something didn't add up. When I looked at individual fielders, I was getting very strange results. It turns out that Baseball Info Solutions made a change to the scoring system this year designed to improve the accuracy of locating balls in play. They increased the size of the graphic they use to capture the data.

This has the nice effect of allowing the reporter to be more precise in marking where the ball landed or was caught. However, the data is somewhat different than the data from previous years, and this was causing my models to exhibit strange behavior.

After spending a day studying the data, I've concluded that indeed, the 2006 data is more accurate. So in order to avoid the pitfalls of mixing the old and new data, I'm going to use just the 2006 data to figure PMR. At some point I may revisit the older data and try to find a way to translate it into this model. But for now, please ignore the previous post.

Probabilistic Model of Range, 2006. Model Includes Parks, Smoothed Visiting Team Fielding. Based on 2006 Season Only.
Team InPlay Actual Outs Predicted Outs DER Predicted DER Difference
Cardinals 4448 3096 3045.22 0.696 0.685 0.01142
Blue Jays 4326 2994 2951.45 0.692 0.682 0.00984
Tigers 4439 3112 3069.37 0.701 0.691 0.00960
Mets 4310 3028 2987.11 0.703 0.693 0.00949
Cubs 4152 2903 2865.00 0.699 0.690 0.00915
Yankees 4472 3103 3065.29 0.694 0.685 0.00843
Giants 4422 3098 3062.31 0.701 0.693 0.00807
White Sox 4528 3138 3106.11 0.693 0.686 0.00704
Angels 4301 2970 2940.33 0.691 0.684 0.00690
Brewers 4300 2950 2922.74 0.686 0.680 0.00634
Dodgers 4536 3084 3057.68 0.680 0.674 0.00580
Royals 4618 3120 3093.21 0.676 0.670 0.00580
Mariners 4431 3054 3029.47 0.689 0.684 0.00554
Padres 4386 3116 3093.20 0.710 0.705 0.00520
Braves 4490 3078 3060.69 0.686 0.682 0.00386
Diamondbacks 4462 3049 3033.47 0.683 0.680 0.00348
Twins 4328 2967 2952.29 0.686 0.682 0.00340
Astros 4342 3039 3024.90 0.700 0.697 0.00325
Phillies 4438 3021 3009.27 0.681 0.678 0.00264
Rangers 4542 3084 3075.69 0.679 0.677 0.00183
Orioles 4435 3013 3011.80 0.679 0.679 0.00027
Rockies 4590 3129 3139.99 0.682 0.684 -0.00239
Athletics 4530 3120 3133.56 0.689 0.692 -0.00299
Red Sox 4463 3028 3041.66 0.678 0.682 -0.00306
Marlins 4339 2971 2985.34 0.685 0.688 -0.00331
Indians 4594 3099 3122.02 0.675 0.680 -0.00501
Nationals 4594 3173 3203.39 0.691 0.697 -0.00662
Reds 4527 3081 3114.48 0.681 0.688 -0.00740
Devil Rays 4545 3048 3085.21 0.671 0.679 -0.00819
Pirates 4448 2997 3034.28 0.674 0.682 -0.00838

There are more changes at the top than at the bottom. The Cardinals rise to number one. The Royals drop to number 12. Still the Royals defense is better than many thought. Their predicted DER was the worst in the majors, meaning the pitching staff was not making it easy on the defense. The Dodgers also do much better under this system, going from a negative to a positive.

The Pirates replace the Nationals as the worst fielding team, with the Devil Rays in the penultimate slot. I guess Tampa can't do anything well, hit, pitch or field. More to come this weekend.


Comments

Wait, I'm a bit confused (admittedly I only skimmed the explanation) but you said the "Cardinals rise to number one," and "The Pirates replace the Nationals as the worst fielding team..."

"DER" measures what happens, "Predicted DER" measures what we thought would have happened based on history, right?

The "Difference" isn't the measure of the teams' defense but a measure of how differently the teams' difference performed relative to history... Did I get that right?

So the Cardinals didn't have the best defense, they just performed the highest relative to their predicted ability... Again, am I on the right track?

Thus, the Devil Rays' defense was 3 points worse than the Pirates' afterall...

I'm admittedly confused and asking more than I'm trying to correct, I just read your chart very differently than you described it.

Thanks for clearing things up.

Posted by: Peter Friberg at November 18, 2006 01:59 AM

I have a different question. If you're only using this year's data to calculate this year's PMR, why isn't the toal number of actual outs equal to predited out? Or they are not supposed to be?

Posted by: Steven at November 18, 2006 02:18 AM

I had the same question as Steven; actual outs are +346 compared to predicted outs, with an actual DER of .687 vs predicted .685.

Is this entirely a consequence of the smoothed visiting model, which can give extra weight to balls put in play by the home team? [Whatever the reason, the home field advantage partly manifests itself in the home team producing better hitting statistics, including batting average on balls in play; this should mean that the home team hits more difficult to field balls. I know your model uses parameters for hit type and "how hard the ball was hit", but perhaps this still isn't fine-grained enough to fully capture the extra difficulty of the home team's batted balls.] If this isn't the explanation, the only other explanation I can think of is some accounting problem on balls which could be fielded by more than one fielder.

Posted by: JoeArthur at November 18, 2006 02:50 AM

In case my previous comment isn't clear enough, I'm suggesting that the smoothed visiting model may lead to a lower baseline for predicted DER. If so, this might not affect relative standing of teams and players, but would mean that performing at the baseline rate is really a below average performance.

Posted by: JoeArthur at November 18, 2006 02:57 AM

Peter:

The Cardinals rising to the top was versus the original chart, usings five years of data. Sorry I didn't explain that more fully.

But you have the general idea. Defense is a combination of pitching and fielding. There are balls that are easy (a grounder right at short) and balls that are difficult to field (a line drive in the left-center power alley). If you have a good defensive team, but the opposition hits a lot of line drives in the gap, then their DER will be low. But if they do better than we predict, based on the balls in play, then it wasn't the defense's fault. It was either lucky hitters or bad pitchers. I think KC is a great example of bad pitchers making the defense look worse than it is.

Joe and Steven,

Yes, this is a fallout of the smoothed model. However, the objections raised in the past to using all balls in play to build the model is that a park is too influence by the home player. In other words, the people who object to too much weight for home fielders in the model think that there's a home field advantage for fielders. You seem to be saying there's a home field advantage for hitters that's spooking the model.

The best way around this is many years of data. We were going in that direction, but with the changes to the BIS scoring system this season, we're going to need to wait again.

Posted by: David Pinto at November 18, 2006 09:21 AM

Thanks for going to all the trouble, David. This table looks closer to Dewan's table, though there are still some weird cases. In particular, the Yankees, White Sox and Angels are all near the bottom of his list, but near the top of yours.

It will be interesting to see if we can isolate the differences in ground balls and/or fly balls.

Posted by: studes at November 18, 2006 12:17 PM

David,

I thought the rationale for the visiting team model was that the baseline expectation for a particular park could be skewed by the regular home team fielder if he was unusually good or bad, since he might account for close to half the opportunities. It's worth pointing out that with the unbalanced schedule, if you're in a 5 team division, around 45% of the visiting team data will come from just 4 teams. So there's still some room for the visiting team data to be a biased sample of average skill. That source of bias should smooth out with more years of data.

But my theory was that many years of data won't help with another problem. Your model is trying to adjust for the difficulty of fielding batted balls. Looking at the problem from the other direction [offense], it is clear that home team hitters outperform visiting team hitters. Part of that advantage is fewer strikeouts and more walks, part of it is a higher rate of home runs. But after removing the offensive outcomes not related to fieldable balls, the home team still does a little better. You can see this by computing BABIP on home and away split data. It's not just scorer bias in deciding hits versus errors, because the rate of doubles and triples is higher, and batters seem to hit line drives and fly balls slightly more often at home (according to splits from STATS).

Because they hit more home runs and because they have a little higher batting average on balls in play, my inference is that the home team hits the ball harder on average, resulting in more difficult to field balls. The BIS data is very granular on direction and distance, but not very granular how hard the ball is hit or the exact trajectory. I am speculating that it may be the case that within the group of medium ground balls to vector J, if you had very precise measurements, you might see that the home team's batted balls were hit an average of 3 mph faster and more likely to be one-hoppers than two-hoppers. In that case the visiting fielders would perform slightly more poorly than the home fielders. To whatever extent this is actually true, visiting team performance sets a baseline slightly below average.

That would be a limitation of the data, not a critique of your model. But it does impact interpretation of the results ...

Posted by: JoeArthur at November 18, 2006 12:39 PM

Joe: You make a good point. However, I don't think the gap results only from home hitters hitting the ball harder. It's also likely that fielders field better when they are on their home, familiar turf.

In any case, I think the best solution is to adjust upward the expected DER for all players. It could be .002 (or whatever) for everyone, or calculate the real:expected DER for each position separately and make the appropriate adjustment.

Posted by: Guy at November 21, 2006 03:10 PM
Post a comment









Remember personal info?