Baseball Musings
Baseball Musings
January 21, 2006
Probabilistic Model of Range, 2005

A number of readers inquired over the last two months if the Probabilistic Range number for 2005 were going to be published this off-season. I'm happy to say I've acquired the data and I'll be presenting tables this week, on teams, defenses behind pitchers, and individual pitchers.

Here's last year's explanation of the model, which I won't repeat here. The idea is to look not just at the balls turned into outs, but how difficult those balls were to turn into outs. Teams or fielders who turn difficult plays into outs do well. Teams or fielders who let easy balls drop for hits (or make errors) do poorly.

One of the hotly debated aspects of this model is how parks are included in the model. The biggest criticism is that home players have too much influence on the model. I'm going to present three tables for the teams that show how parks change the data.

One will be the model as described in the previous work.

One will be the model without parks in the model.

The third will be a combination of the two, 50% of each.

All models are built on data from four years, 2002-2005.

Probabilistic Model of Range, 2005. Model Includes Parks
TeamInPlayActual OutsPredicted OutsDERPredicted DERDifference
Astros42042963 2854.17 0.705 0.679 0.02589
Indians43853108 2995.26 0.709 0.683 0.02571
Phillies42112962 2853.80 0.703 0.678 0.02570
Athletics42863064 2954.86 0.715 0.689 0.02546
White Sox44573175 3066.86 0.712 0.688 0.02426
Cardinals44143101 3007.96 0.703 0.681 0.02108
Blue Jays45113156 3063.16 0.700 0.679 0.02058
Braves45593162 3073.91 0.694 0.674 0.01932
Twins45453193 3107.42 0.703 0.684 0.01883
Angels43833070 2998.12 0.700 0.684 0.01640
Giants45203152 3080.03 0.697 0.681 0.01592
Orioles43773032 2964.67 0.693 0.677 0.01538
Pirates44673095 3032.44 0.693 0.679 0.01400
Diamondbacks45713118 3059.45 0.682 0.669 0.01281
Red Sox45753127 3068.95 0.683 0.671 0.01269
Devil Rays45603112 3054.72 0.682 0.670 0.01256
Cubs41172871 2819.97 0.697 0.685 0.01239
Mariners45463184 3128.12 0.700 0.688 0.01229
Tigers45273152 3097.51 0.696 0.684 0.01204
Brewers42522960 2916.77 0.696 0.686 0.01017
Rangers46973200 3158.10 0.681 0.672 0.00892
Dodgers43923073 3036.02 0.700 0.691 0.00842
Mets44243094 3058.20 0.699 0.691 0.00809
Rockies45373043 3013.43 0.671 0.664 0.00652
Padres44233051 3047.08 0.690 0.689 0.00089
Marlins43672965 2965.42 0.679 0.679 -0.00010
Yankees44833087 3092.01 0.689 0.690 -0.00112
Nationals45383161 3166.79 0.697 0.698 -0.00128
Royals46113068 3099.97 0.665 0.672 -0.00693
Reds46503148 3182.99 0.677 0.685 -0.00753

Unlike 2004, this was a very good defensive year. Seven of the top eight teams in the list made the playoffs or were in contention as late as the last week of the season. Now for the teams with no park adjustment.

Probabilistic Model of Range, 2005. Model Does Not Include Parks
TeamInPlayActual OutsPredicted OutsDERPredicted DERDifference
Phillies42112962 2812.44 0.703 0.668 0.03552
Athletics42863064 2921.09 0.715 0.682 0.03334
Indians43853108 2970.70 0.709 0.677 0.03131
Astros42042963 2835.95 0.705 0.675 0.03022
Braves45593162 3043.69 0.694 0.668 0.02595
White Sox44573175 3061.04 0.712 0.687 0.02557
Cardinals44143101 2992.97 0.703 0.678 0.02447
Blue Jays45113156 3066.66 0.700 0.680 0.01981
Giants45203152 3062.55 0.697 0.678 0.01979
Dodgers43923073 2992.05 0.700 0.681 0.01843
Cubs41172871 2799.86 0.697 0.680 0.01728
Nationals45383161 3082.57 0.697 0.679 0.01728
Orioles43773032 2960.89 0.693 0.676 0.01625
Diamondbacks45713118 3051.28 0.682 0.668 0.01460
Angels43833070 3007.42 0.700 0.686 0.01428
Twins45453193 3130.04 0.703 0.689 0.01385
Pirates44673095 3034.07 0.693 0.679 0.01364
Mariners45463184 3124.61 0.700 0.687 0.01306
Tigers45273152 3101.99 0.696 0.685 0.01105
Brewers42522960 2913.06 0.696 0.685 0.01104
Mets44243094 3051.37 0.699 0.690 0.00964
Devil Rays45603112 3068.61 0.682 0.673 0.00951
Rangers46973200 3165.60 0.681 0.674 0.00732
Red Sox45753127 3104.20 0.683 0.679 0.00498
Padres44233051 3039.75 0.690 0.687 0.00254
Rockies45373043 3035.26 0.671 0.669 0.00171
Marlins43672965 2958.27 0.679 0.677 0.00154
Reds46503148 3155.28 0.677 0.679 -0.00157
Yankees44833087 3135.64 0.689 0.699 -0.01085
Royals46113068 3130.12 0.665 0.679 -0.01347

You can see the big drop in the Red Sox defense if you don't include the park in the calculation of team range. Lots of balls that would be outs other places hit the wall in Fenway. Without the adjustment, the Red Sox defense looks worse than it is.

Here's the smoothed model:

Probabilistic Model of Range, 2005. 50% Model With Parks, 50% Model Without Parks
TeamInPlayActual OutsPredicted OutsDERPredicted DERDifference
Phillies42112962 2833.12 0.703 0.673 0.03061
Athletics42863064 2937.98 0.715 0.685 0.02940
Indians43853108 2982.98 0.709 0.680 0.02851
Astros42042963 2845.06 0.705 0.677 0.02805
White Sox44573175 3063.95 0.712 0.687 0.02492
Cardinals44143101 3000.46 0.703 0.680 0.02278
Braves45593162 3058.80 0.694 0.671 0.02264
Blue Jays45113156 3064.91 0.700 0.679 0.02019
Giants45203152 3071.29 0.697 0.679 0.01786
Twins45453193 3118.73 0.703 0.686 0.01634
Orioles43773032 2962.78 0.693 0.677 0.01581
Angels43833070 3002.77 0.700 0.685 0.01534
Cubs41172871 2809.92 0.697 0.683 0.01484
Pirates44673095 3033.25 0.693 0.679 0.01382
Diamondbacks45713118 3055.36 0.682 0.668 0.01370
Dodgers43923073 3014.04 0.700 0.686 0.01343
Mariners45463184 3126.36 0.700 0.688 0.01268
Tigers45273152 3099.75 0.696 0.685 0.01154
Devil Rays45603112 3061.67 0.682 0.671 0.01104
Brewers42522960 2914.92 0.696 0.686 0.01060
Mets44243094 3054.78 0.699 0.691 0.00886
Red Sox45753127 3086.58 0.683 0.675 0.00884
Rangers46973200 3161.85 0.681 0.673 0.00812
Nationals45383161 3124.68 0.697 0.689 0.00800
Rockies45373043 3024.35 0.671 0.667 0.00411
Padres44233051 3043.42 0.690 0.688 0.00171
Marlins43672965 2961.85 0.679 0.678 0.00072
Reds46503148 3169.14 0.677 0.682 -0.00455
Yankees44833087 3113.83 0.689 0.695 -0.00598
Royals46113068 3115.04 0.665 0.676 -0.01020

I'm open as always to comments on which of these you think is best, or how any of them might be improved. The best suggestions I've heard, however, involve much more complicated programming. I like this simple model.

One thing is very clear, the Yankees, Royals and Reds did not help their pitching staffs in 2005, no matter how you look at the data.

A hat tip to Mitchel Lichtman, who used this idea first in UZR, but has gone on to private practice.


Comments

I know this subject was brought up previous in the other PMR discussions, but looking at the Difference column, most teams are positive. What I'd like to see is some numbers metric of above/below average. That's one of the reasons why numbers like Rate and OPS+ are so easy to use. Just looking at the number tells you how the player did compared to average.

Posted by: sabernar at January 21, 2006 09:02 PM

With 25 of 30 teams in the basic model outperforming their predicted DER, and a 26th team being virtually identical to the model, it seems to me that there's either a problem in the model, or something in the BIP distribution changed between 2004 and 2005. The model should not be underpredicting DER for that many teams.

Posted by: Mike Emeigh at January 21, 2006 09:19 PM

I've re-done the numbers for my own use with a fudge factor. If anyone wants them, drop me an e-mail.

David

Oh, and David, this is great stuff! Keep it coming.

Posted by: David Gassko at January 21, 2006 09:23 PM

Mike, the model is based on four years of data. My guess is if I ran all four team seasons, you'd see an even number above and below 0.

Posted by: David Pinto at January 21, 2006 09:29 PM

David, interesting data. One question and observation. How do you interpret the DER? If the Phillies have a DER of .703, does that mean there's a 70 percent chance that a ball put in play will turn into an out? And I noticed that there's not a lot of variation in the data. It seems like all the team are in the .68 to .70 range. Is there any concrete way to describe how a team with a DER of .700 differs from a team with a DER of .680?

Posted by: steve at January 22, 2006 12:23 AM

Steve, your interpretation of DER is correct. These are fieldable balls in play, so out of the park home runs are removed from the equation.

It may help you get a handle on DER by thinking of it as 1.0-DER. That would be approximately batting average vs. the defense. So the difference between a .700 DER and a .680 DER is the difference between a .300 hitter and a .320 hitter.

Posted by: David Pinto at January 22, 2006 08:21 AM

David, you rock! Love this stuff.

Besides the Red Sox drop w/ no park variable, the Nationals rise is also quite drastic, from #12 to #27. That big stadium must let them catch many fly balls that would be HR's elsewhere.

Posted by: Jason at January 22, 2006 08:56 AM

Really great information.

Another way to think about what a .703 DER means is that the Phillies' fielders made 129 more outs than expected. That's about 100 runs (depending on distribution btwn IF and OF), or a reduction in team RA/G of 0.62. In fact, making the "difference" column in the table outs (or runs) rather than DER would probably be more helpful.

"Unlike 2004, this was a very good defensive year."
With about 130,000 BIP each year, I'm skeptical that there are really 'bad' or 'good' years. The THT site shows overall DER at .695 in 2004 and .695 in 2005. If that's right, it's hard to conclude this was a better fielding year. Or if DER was higher this year, I would look to factors like weather, park changes, strike zone changes, etc. before concluding the overall level of fielding changed.

Posted by: Guy at January 23, 2006 11:06 AM

David is taking the perspective, in his presentation and perhaps in his opinion, the following:

given that the "true mean" league-average is zero for the last 4 years, any deviation within those 4 years at the league-level is a true difference in skill.

There are a multitude of reasons why the "sample" mean is not zero for each of the last 4 years (within the 4-yr population). Change in personnel or quality of fielders is one, but that is probably way down the list. External factors, like the climate, or ball (or list your favorite) can be a cause.

It is much more beneficial, and probably more accurate, to list the annual "true mean" of each sample year as zero. It makes in-season comparisons much easier, and easier to grasp.

Year-to-year comparisons are probably more accurate as well (rather than making it seem like 80% of the fielders, improved, or whatever the number is).

That said, I love this!

Posted by: tangotiger at January 23, 2006 02:40 PM

There are two things being tracked: real DER and pred DER. Real DER changes very little -- approx .691 in 2004 and .694 in 2005 according to David's data. If 3 plays out of 1,000 makes '05 a "better fielding year," well OK (but the difference is w/in the MOE). But the predicted DER goes from .698 last year to .681 this year. That doesn't make sense to me. The distribution of 125,000 BIP can't possibly be that different.

Posted by: Guy at January 23, 2006 04:42 PM

Guy,

You raise an excellent point. I'll look into this. One problem is that if you look at the table from last year, there's a different underlying model, in that it's based on three years (2002-2004) of data. I don't know if that's enough to cause a difference, but I'm going to run 2004 against the four year model and see.

Posted by: David Pinto at January 23, 2006 04:53 PM
Post a comment









Remember personal info?