Baseball Musings
Baseball Musings
January 29, 2005
Probabilistic Model of Range, Shortstops

It's time to start looking at individual players. We'll start with the position to get the most opportunities, the shortstops. As the following table shows, it wasn't a great season for these middle infielders.

Probabilistic Model of Range, Shortstops 2004, 1000 balls in play.
PlayerInPlayActual OutsPredicted OutsDERPredicted DERDifference
Pokey Reese1532206 200.75 0.134 0.131 0.00343
Adam Everett2356315 309.60 0.134 0.131 0.00229
Cristian Guzman3950499 492.35 0.126 0.125 0.00168
Julio Lugo3874495 492.12 0.128 0.127 0.00074
Rich Aurilia2070243 242.28 0.117 0.117 0.00035
Bobby Crosby4132557 557.61 0.135 0.135 -0.00015
Jose C Lopez1533164 165.00 0.107 0.108 -0.00066
Jimmy Rollins4187473 476.56 0.113 0.114 -0.00085
Alex Gonzalez3996482 485.71 0.121 0.122 -0.00093
Neifi Perez1729202 203.81 0.117 0.118 -0.00105
Cesar Izturis4119495 500.91 0.120 0.122 -0.00144
Chris Woodward1625194 196.74 0.119 0.121 -0.00169
Carlos Guillen3597490 496.37 0.136 0.138 -0.00177
Chris Gomez1992230 233.60 0.115 0.117 -0.00181
Wilson Delgado1053145 149.37 0.138 0.142 -0.00415
Orlando Cabrera4090497 514.77 0.122 0.126 -0.00434
Khalil Greene3634428 444.56 0.118 0.122 -0.00456
Craig Counsell3432403 419.30 0.117 0.122 -0.00475
Jose Valentin3141412 427.57 0.131 0.136 -0.00496
Jack Wilson4096532 555.52 0.130 0.136 -0.00574
Ramon E Martinez1507193 201.93 0.128 0.134 -0.00593
Edgar Renteria3921459 484.36 0.117 0.124 -0.00647
Derek Jeter4178493 521.56 0.118 0.125 -0.00684
Jose Vizcaino1399171 181.51 0.122 0.130 -0.00751
Miguel Tejada4340573 608.49 0.132 0.140 -0.00818
Royce Clayton3971452 485.18 0.114 0.122 -0.00836
Michael Young4382483 520.15 0.110 0.119 -0.00848
Kazuo Matsui3004370 395.82 0.123 0.132 -0.00860
Deivi Cruz2430296 318.30 0.122 0.131 -0.00918
Omar Vizquel3833437 473.87 0.114 0.124 -0.00962
Alex Cintron3320407 438.92 0.123 0.132 -0.00962
Angel Berroa3745442 480.58 0.118 0.128 -0.01030
Alex S Gonzalez1906199 219.12 0.104 0.115 -0.01056
Barry Larkin2179260 284.27 0.119 0.130 -0.01114
Rafael Furcal3501420 461.64 0.120 0.132 -0.01189
David Eckstein3562356 400.26 0.100 0.112 -0.01243
Nomar Garciaparra2019204 230.57 0.101 0.114 -0.01316
Felipe Lopez1264143 165.30 0.113 0.131 -0.01764

One hypothesis for the overall poor play by shortstops in 2004 is the aging of the big players. Vizquel, Jeter, Garciaparra and Tejada are not youngsters anymore. A-Rod moving out of the position hurt also. All of these players will be a year older in 2005; it will be interesting to see if there is further decline in the position as a whole.

It looks like the Nationals got a decent vacuum cleaner at short with their signing of Christian Guzman. With all the talk about Rich Aurilia being old and broken down, he did a very good job fielding. It also appears that the Angels got a nice upgrade replacing Eckstein with Cabrera. If Eck fields like that for the Cardinals, don't expect that team to be number one in defense again next season.

Pokey Reese, who was supposed to spend most of his time at 2nd base before the Nomar Garciaparra injury, had the best range at shortstop in the majors in 2004. Nomar was down near the bottom. This gives us a chance to evaluate the Red Sox shortstops.

Boston Red Sox Shortstops, 2004 (Minimum 10 balls in play)
PlayerInPlayActual OutsPredicted OutsDERPredicted DERDifference
Cesar Crespo28836 32.83 0.125 0.114 0.01101
Pokey Reese1532206 200.75 0.134 0.131 0.00343
Orlando Cabrera1465174 180.47 0.119 0.123 -0.00442
Ricky Gutierrez10613 13.89 0.123 0.131 -0.00838
Nomar Garciaparra96485 110.10 0.088 0.114 -0.02604
Mark Bellhorn323 4.23 0.094 0.132 -0.03840

So if we go back to the Garciaparra/Cabrera trade, we can now see it in it's full light. It wasn't that the Red Sox defense had been bad all year; it's that it was bad with Nomar at shortstop. With Reese injured, Boston figured they needed another fielder at the position. However, Boston may have jumped the gun. There's some evidence that Nomar was just rusty. Compare Nomar with Cabrera after the trade:

SS Range, 2004Nomar with CubsCabrera with Red Sox
In Play 1055 1465
Actual Outs 119 174
Predicted Outs 120.47 180.47
DER .113 .119
Predicted DER .114 .123
Difference -0.00139 -0.00442

So after the trade, Garciaparra had better range than Cabrera. Yes, Cabrera was able to play more. The uncertainty of Nomar's future health was certainly a factor in the deal. But given Nomar's play the rest of the way, Boston could have done without the trade and been just as good on defense, with Crespo or Reese (once he got off the DL) spelling Nomar occasionally. I felt at the time that defense was an excuse to move a player the Red Sox no longer wanted. This data does nothing to change my mind on the matter.


Posted by David Pinto at 05:40 PM | Defense | TrackBack (3)
Comments

Thanks for sharing the results!

I've been putting together a close study of (MGL's)UZR, Diamond Mind, (my) DRA, (BP's) DFT and (STATS, Inc.'s) ZR. Cabrera was clearly the most consistently good and durable shortstop during that time period under all these metrics.

In a sample of half a season, things don't average out, even if you have great PBP data. Tangotiger has strongly suggested that, even with UZR, you really need two years of data.

Posted by: Michael Humphreys at January 29, 2005 07:55 PM

Looks like Jeter is still a problem at -28 plays, or approximately -18 runs. Same as it ever was.

His non-PBP stats had looked better according to DFT.

Posted by: Michael Humphreys at January 29, 2005 07:58 PM

Being a Tiger fan, I always check to see how the Tigers do when I see new stats reported. I'm somewhat surprised to see Carlos Guillen with the highest actual DER among starters. Slightly more surprising is that the .138 predicted for him was among the highest as well. Don't know if it's the long grass at Comerica slowing balls down, or maybe, just maybe the Tigers staff isn't so bad.

Great stuff David, can't wait for the rest of the positions.

Posted by: billfer at January 29, 2005 08:33 PM

Hindsight, David, hindsight. Perfect vision.

I watched Nomar. He was horrendous. He could not get to any ball, and his arm wasn't as powerful. How can you possibly say that the Red Sox weren't justified? What if Nomar continued his pisspoor range? What then? We had no idea.

Not only that, but let's assume Nomar stays. We don't win the World Series, let alone the ALCS.

- Nomar was a cancer, like Sammy Sosa. He had to go. HAD to. He's a nice guy, but this year, he was a cancer.

- OC brought a spark. HE's the one that turned this all around.

- Minky brought defense at first and took a ball (hee hee). But seriously, Minky's defense helped a hell of a lot.

- Dave Roberts ... well, that goes without saying.

The fact is Nomar needed to go. The Red Sox needed him to go. So he went. And we got a better defensive shortstop. Sure, in hindsight it's a wash - but no Minky, no Roberts.

I don't think you can use hindsight to judge this trade on just defense. What about offense? Intangibles? A spark? No, this trade was genius.

Posted by: Evan at January 29, 2005 10:18 PM

Evan,

If Theo had made the argument you made, I'd be fine. But he didn't. He made a pure defensive argument. And the cancer in the clubhouse argument just doesn't hold water. The A's and Yankees of the 1970's both had Reggie Jackson, and both won. The fact is that Nomar's bad period was a small sample size; and as it turned out, it was the anomaly. I'm not saying it was a bad trade. It was an unneeded trade, and the reasons given to the public were spin.

Posted by: David Pinto at January 29, 2005 10:30 PM

What does Dave Roberts have to do with the Nomar trade?

Posted by: Bill at January 30, 2005 01:32 AM

You're judging the motivation for the Nomar trade by the wrong standard. The fact is that Nomar was terrible at SS during his stint with the RS. I don't know what was in peoples' heads at the time but I did see every RS game and it looked to me like Nomar (a) did not want to be there, and (b) was not going to get any better. Who knows what might have happened, but at the rate batted balls were pouring through the left side of the infield it's tough to disagree with Theo's decision or the reasons he gave for making it. Red Sox defense did improve after the trade. Whether or not Nomar played better in Chicago after the trade is not relevant to what happened before it.

Posted by: Peter Boston at January 30, 2005 05:20 AM

Just as an interesting note:

I noticed that the good defenders were far better than their "below expected" counterparts, but to make sure I went ahead and calculated the average (sum of differences/38 players). The answer comes out to: -0.00543. If you assigned this value as "Average DER for MLB Shortstops," then the line would be drawn between Jose Valentin and Jack Wilson, which is very close to the exact middle of the pack.

So if you want to feel better, you can say Jose Valentin was "above average" rather than had a "negative DER." =P

--------------------------------------------

Did anyone else notice that the two ends of the curve are being held on one side by three guys and the other by 10+ guys? I have a nagging suspicion that not all of the negative numbers are from the players themselves, either. I don't have your model in my hands (or on my computer, rather), but it seems to me that these results smell fishy. The list is almost certainly in the right order, and the differences between players are almost certainly accurate, but my eyes keep drifting toward that tiny "Update" note in your first DER post, David. The thing about the model being built on "three years data."

Is it possible that your model is too heavily weighing the performances of the shortstop three years past? Taking a clear example: Because Miguel Tejada played shortstop for the A's from 2001-2003, Bobby Crosby's DER is almost entirely based on Miguel Tejada's performance. This might also explain your hypothesis that aging has a factor: since the model is essentially working around the past three years only, you could have a player literally being compared to himself. Only younger.

I love the work, though, so please keep it up!

PS - The Gold Glove is now completely meaningless to me. Thanks a lot, Derek.

Posted by: Inquisitor at January 30, 2005 06:20 AM

It's meaningless to you NOW?

What about when Raffy won it?

Posted by: Larry Mahnken at January 30, 2005 08:08 AM

The thing that bothers me about this method is that the important number -- "Difference" -- is really hard to interpret. Does a fielder need a positive Difference to be good? I don't think so. That standard demands near perfection -- get all the predicted outs, plus some. If having a positive difference isn't the goal -- then what is the cutoff for determining whether someone was 'good' versus 'average' versus 'poor'?

It seems to me that this would be better served by scaling Difference so that it is a percentage. So that we can compare someone who has 98.5% of the predicted number with someone who has 93%. As it is now, trying to compare someone with -0.00343 versus -0.00838 and -0.01316 -- well I can see a difference, but what does it mean?

Posted by: Daniel Zappala at January 30, 2005 10:57 AM

Inquisitor, the model for Bobby Crosby at home is approximately 1/3 Tejada, 1/2 all visiting shortsops, and 1/6 Crosby. I don't have a problem with that. There are all types of weighting schemes I could (and probably will) try. But the model for Crosby certainly isn't all Tejada.

Posted by: David Pinto at January 30, 2005 11:25 AM

This is based on play by play data, like UZR, correct? Do you have any idea how David Eckstein can rank so low here and at the same time have a very high UZR?

Posted by: Rally Monkey at January 30, 2005 01:10 PM

And the cancer in the clubhouse argument just doesn't hold water. The A's and Yankees of the 1970's both had Reggie Jackson, and both won.

Exactly. I feel the same way when people try to say so-and-so played poorly because he was injured. The injury argument just doesn't hold water. kirk gibson hit a homerun off of dennis eckersley with an injured knee, so therefore healthy knees aren't necessary for playing well.

Posted by: Use of uninitialized value in substitution (s///) at plugins/Blacklist/lib/Blacklist/App.pm line 44. at January 30, 2005 01:30 PM

David, from what you're telling me, it seems like you're using combined totals for the base probabilities of a ball being turned into an out rather than rated totals. Obviously, trying to control for all of the possible individual differences between players would be prohibitive (and wouldn't accomplish much, anyway), but wouldn't it be more accurate to *rate* the probabilities rather than assign 1-3 players more weight simply because they happened to be the ones playing there the most?

Even if 1/2 the probability of a ball in play being turned into an out is based upon all the other players who have played in that ballpark under that situation, that still leaves the other 1/2 weighted to only a few individuals (less than three in most cases). The most extreme example would be if Pokey Reese played shortstop for one team that plays in a defense-inhibiting ballpark. Reese's exemplary play could potentially negate all of the "ballpark factor". This would leave his replacement of, say, Craig Counsell, looking like Felipe Lopez (to use your rankings), because Counsell would be playing in a roughtly "neutral" ballpark that is actually *harming* his performance, thereby making him look worse than he actually is (probably far worse, because Reese is at the top of the pyramid, so his defense probably outstrips any negative ballpark factor for shortstops).

Granted, you will run into the same meta-sample problem regardless (players from the same division play in the same parks, and therefore those players' tendencies are weighted more heavily than a player who only played three games at the park). However, I think that a lot of control could be made out of simply expanding the sample size (to maybe 5-10 years?) and rating the contributions of each player to the probability, rather than weighting them based on playing time. That would essentially erase any of the problems that I am concerned with.

The Miguel Tejada/Bobby Crosby comparison was only an example, the easiest that I could come up with off the top of my head (being an A's fan). But my honest concerns deal with how specific players are essentially being compared more with their younger counterparts or the starters that they are replacing, rather than the actual probability that a ball will be turned into an out.

That being said, I want to make it absolutely clear that I'm just trying to offer some constructive criticism. If there's a reason why you don't feel my concerns are really worth worrying over, then by all means explain it to me. My statistics education only goes as far as upper-division research statisitcs courses, not hardcore math.

Posted by: Inquisitor at January 30, 2005 02:01 PM

To me, your data only confirms the veracity to Theo's reasoning that defense was the reason for the trade.

As others have pointed out, Epstein didn't have the luxury of hoping that Nomar was just "rusty."

Indeed, isn't that the whole point of using stats to make decisions, i.e, to remove the "gut feeling" aspect of evaluating players?

It's great that the numbers show Nomar got better after the trade and may in fact have been "rusty." But it's even better to have a GM that saw a problem and did something about it.

Meanwhile, you have absolutely nothing but pure supposition to support your argument "that defense was an excuse to move a player the Red Sox no longer wanted."

That's kind of weird for a stats guy isn't it?

Posted by: Edw. at January 31, 2005 05:26 AM

Inquisitor, re Derek Jeter/Gold Glove; were you
saying that the stats under discussion were an av-
erage of 3 years; and that they led you to a con-
clusion or affirmation about the Gold Glove? Since
Derek's Gold Glove was only for one year, how did
the 3 year stats affect this? (If they did). Just cur-
ious.

Posted by: susan mullen at January 31, 2005 09:25 AM

Two words make me skeptical of these numbers:

Rich Aurilia

Posted by: Ralph Malph at January 31, 2005 08:51 PM

susan,

My concerns with the 3-year data used to create the model are completely separate from the Derek Jeter comment. Everyone who looks objectively at Jeter's numbers knows that he is a horrible fielding shortstop. I was just saying that Derek Jeter's winning of the Gold Glove award completely destroyed any prestige associated with the award, and David's PMR data only solidified that destruction. =P

Posted by: Inquisitor at February 2, 2005 02:28 AM

I don't think anyone will look at this, as I'm about 6 months late for this discussion, but I think this needs to be said. Mientkiewicz came over in the Nomar trade. So, Theo got Cabrera, who was certainly an upgrade over Nomar's first half (albeit maybe not his second half) AND Mientkiewicz, who was an upgrade over Millar. Maybe that still doesn't add enough to make this trade only about defense, but it certainly helps.

(Roberts came over in a separate trade with LA, so he can't really be considered in the shortstop discussion)

Posted by: Chip at June 30, 2005 09:47 AM

I am looking at this discussion well past its post date...

The statistical basis that creates expected DER is data from the last three years. If indeed there is a decline in overall performance due to aging, it might be easier to compare current performance of the players by looking at the standard deviation from mean of DER diff rather than value of DER diff.

Posted by: Paul Grossi at September 24, 2005 01:54 PM
Post a comment









Remember personal info?