February 20, 2011

Five Year PMR

As a follow up to my last post on objective Probabilistic Model of Range (PMR), I down loaded the latest retrosheet data, complete through 2010. I thought I might actually extend the model with vectors, but few events have hit location attached.

On the other hand, retrosheet data is very consistent from year to year with classifying balls in play as grounders, flies, liners and pops. So I added that to the model. The model now contains three objective parameters, batter-handedness, pitcher handedness, and stadium, and well as the slightly subjective batted ball type.

I also decide to look at these models longer term, so I’ll do a series of five-year studies, covering fielding from 2006 to 2010. For each model for a season, I used five years of data, not including the year in question. This way, none of the data for the season in question was used to train the model. I used the 2005 through 2010 data. For example, the model for 2006 is built from 2005, 2007, 2008, 2009, 2010. The 2009 data is built from 2005-2008 and 2010. The only exception are the models for Target Field, which only existed in 2010. I also only built the model with data on the visiting fielders, so a great or terrible defender for a team would not influence the model that much.

Just to review, PMR determines the probability of a batted ball being turned into an out based on a set of parameters. Adding those probabilities up for each ball in play gives us the expected number of outs. Calculating an index by the formula (100*actual outs/predicted outs) allows a ranking, with number over 100 good and numbers under 100 poor. When I used BIS data, I included a direction for the ball, and a measure of distance. Those elements are subjective, however, so leaving them out removes the biases of the scorers. Since the number of balls in play an individual fielder can handle is small in any year, by looking at a longer term model we should get a better picture of who are the best glove men.

Compiling the data for five years, the Red Sox were the best defensive team in the majors.

Team PMR, 2006-2010
Team In Play Actual Outs Predicted Outs Actual DER Predicted DER Index
BOS 17014 11800 11524.4 0.694 0.677 102.4
NYA 17125 11878 11636.5 0.694 0.680 102.1
COL 17595 12096 11894.7 0.687 0.676 101.7
NYN 17490 12168 11973.4 0.696 0.685 101.6
TOR 17110 11874 11691.9 0.694 0.683 101.6
ANA 17487 12023 11880.8 0.688 0.679 101.2
TEX 17830 12260 12117.6 0.688 0.680 101.2
DET 17710 12214 12084.1 0.690 0.682 101.1
SEA 17881 12370 12253.6 0.692 0.685 101.0
SLN 17996 12452 12327.8 0.692 0.685 101.0
ATL 17229 11918 11816.6 0.692 0.686 100.9
SFN 16821 11752 11653.6 0.699 0.693 100.8
PHI 17548 12164 12077.2 0.693 0.688 100.7
TBA 17142 11856 11768.6 0.692 0.687 100.7
ARI 17339 11945 11874.9 0.689 0.685 100.6
CHN 16667 11615 11544.6 0.697 0.693 100.6
LAN 16889 11745 11690.8 0.695 0.692 100.5
CLE 18081 12369 12310.3 0.684 0.681 100.5
OAK 17441 12122 12075.0 0.695 0.692 100.4
MIN 17929 12336 12283.0 0.688 0.685 100.4
SDN 17237 12032 12019.3 0.698 0.697 100.1
KCA 17765 12096 12096.4 0.681 0.681 100.0
BAL 18048 12392 12430.5 0.687 0.689 99.7
WAS 18063 12432 12469.4 0.688 0.690 99.7
FLO 17406 11830 11860.0 0.680 0.681 99.7
CIN 17540 12098 12143.1 0.690 0.692 99.6
MIL 17408 11921 11968.0 0.685 0.687 99.6
PIT 18280 12382 12481.3 0.677 0.683 99.2
HOU 17443 11939 12083.1 0.684 0.693 98.8
CHA 17630 12044 12197.7 0.683 0.692 98.7

I was somewhat surprised to see the Yankees second. The team has brought in a few good defenders in recent years. The Rockies were the best team in the NL, and their speedy outfielders make a difference in the big park. The White Sox inhabit the bottom of the rankings, with the Astros bringing up the rear in the NL.

Over the next week studies will include each position, as well as defenses behind pitchers.

Leave a Reply

Your email address will not be published. Required fields are marked *