Category Archives: Probabilistic Model of Steals

March 24, 2010

Probabilistic Model of Steals, Catchers

Earlier this month I introduced the idea of a Probabilistic Model of Steals. In that post, I looked at how the stage of the game, represented by the inning, changed the probability of attempting a steal, and the success rate on those attempts. Subsequent posts examined outs, the score difference and the combination of all three. This posts looks at running against catchers, comparing actual stolen base attempts against the expected number of attempts based on the model.

To review, I’m only concerned with the pure steal situation, only a runner on first. I define score difference from the point of view of the offensive player (Offensive score – defensive score). In looking at the data, a lead of seven runs in either direction seemed to be the point where teams really stopped running, so any lead of seven runs or greater is either placed in the 7 or -7 bin. I also put all extra innings in the 10 bin for that category, since each extra inning starts the same, with the score tied. The data is from 1996 through 2008. The following table displays all catchers with at least 1500 steal situations against them defined by the three parameters. The catchers are ranked by the ratio of actual steals to expected steals, times 100. A ratio of 100 means that runners stole exactly as often as expected against the catcher. A ratio over 100 means runners attempt steals more often than expected, under 100, less often. There are 105 catchers in the study:

Catcher Situations Expected Attempts Actual Attempts Ratio (100*Act/Exp) SB Pct
Yadier Molina 3491 283.088 170 60.05 47.6
Miguel Olivo 4423 375.390 244 65.00 59.4
Ivan Rodriguez 12110 1013.243 661 65.24 53.6
John Buck 3951 339.500 224 65.98 67.9
Joe Mauer 3609 308.226 206 66.83 55.8
Kurt Suzuki 1571 139.770 98 70.12 63.3
Rod Barajas 4739 397.824 284 71.39 63.0
David Ross 2563 219.334 165 75.23 52.7
Jason LaRue 5520 478.565 368 76.90 60.1
Kenji Johjima 2665 228.173 183 80.20 60.7
Gary Bennett 3656 308.084 248 80.50 77.0
Toby Hall 5085 436.791 354 81.05 65.0
Brian Schneider 5878 506.174 411 81.20 61.1
Benito Santiago 6126 526.579 431 81.85 65.9
Javier Valentin 2756 230.494 192 83.30 67.2
Sal Fasano 2704 220.099 184 83.60 66.3
Mike Macfarlane 2222 185.312 156 84.18 62.2
Chris Snyder 3033 257.299 218 84.73 65.6
Tom Prince 1793 149.280 128 85.74 52.3
Mike Lieberthal 8124 698.239 606 86.79 67.2
Dan Wilson 7507 616.322 536 86.97 64.4
Dioner Navarro 2757 237.818 208 87.46 67.8
A.J. Pierzynski 7473 623.044 547 87.79 72.9
Ronny Paulino 2203 190.284 168 88.29 67.9
Charles Johnson 7919 666.605 593 88.96 60.4
Brad Ausmus 11405 965.566 859 88.96 63.6
Brandon Inge 2831 250.173 225 89.94 61.8
Terry Steinbach 3546 306.729 277 90.31 65.3
Bengie Molina 7755 669.462 613 91.57 64.9
Gerald Laird 2694 224.456 206 91.78 60.2
Yorvit Torrealba 3346 284.549 263 92.43 66.9
Russ Martin 2996 256.413 241 93.99 69.7
Mike Matheny 8079 674.430 638 94.60 63.9
Chad Moeller 2713 228.116 217 95.13 72.8
Jason Kendall 13492 1165.752 1117 95.82 68.4
Ben Davis 3263 276.914 269 97.14 62.5
Ramon Castro 2210 175.687 171 97.33 67.3
Brian McCann 3059 258.820 252 97.36 76.6
Henry Blanco 4825 403.453 393 97.41 54.5
Keith Osik 2197 180.433 176 97.54 63.1
Tony Eusebio 2430 202.067 198 97.99 72.2
Vance Wilson 2147 172.713 170 98.43 56.5
Ramon Hernandez 8399 703.568 693 98.50 68.4
Eli Marrero 2001 169.988 168 98.83 60.1
Ryan Doumit 1522 134.526 133 98.87 73.7
Carlos Ruiz 1616 135.233 134 99.09 73.1
Brent Mayne 5414 450.856 448 99.37 68.1
Johnny Estrada 4054 338.078 336 99.39 74.4
Brian Johnson 2313 199.007 198 99.49 68.2
Charlie O’Brien 1903 163.668 163 99.59 56.4
Bobby Estalella 2050 167.431 167 99.74 72.5
Alberto Castillo 2509 206.503 206 99.76 57.3
Damian Miller 6900 573.742 574 100.04 61.8
Wiki Gonzalez 1700 148.420 149 100.39 58.4
Chad Kreuter 3372 282.629 284 100.49 62.7
Todd Pratt 2727 216.830 218 100.54 69.7
Michael Barrett 6268 544.408 551 101.21 77.7
A.J. Hinch 2382 198.742 202 101.64 71.3
Jose Molina 3059 262.264 267 101.81 55.8
Greg Myers 3051 259.269 264 101.82 65.5
Mike Napoli 1708 140.558 144 102.45 75.7
Javy Lopez 8272 689.877 714 103.50 69.0
Tom Lampkin 2906 238.296 247 103.65 62.8
Sandy Alomar 6038 499.116 518 103.78 70.3
Matt Treanor 1683 137.967 144 104.37 73.6
Victor Martinez 4543 377.035 396 105.03 71.7
Gregg Zaun 6435 551.635 592 107.32 74.2
Mike Difelice 3636 307.805 331 107.54 65.3
Mike Redmond 4276 368.289 398 108.07 66.3
Paul Bako 4759 410.146 444 108.25 65.5
Joe Oliver 2634 220.567 239 108.36 66.9
Jason Varitek 8810 718.876 779 108.36 70.3
Joe Girardi 4855 404.204 447 110.59 64.7
Jorge Fabregas 3287 281.795 312 110.72 62.8
Kelly Stinnett 3614 305.224 338 110.74 66.9
Geronimo Gil 1869 157.016 174 110.82 66.7
Einar Diaz 4719 380.470 427 112.23 64.4
Jeff Reed 2983 257.212 289 112.36 68.9
Brook Fordyce 4173 352.378 399 113.23 74.2
Mark Johnson 2312 183.390 208 113.42 63.5
Todd Greene 2154 178.777 204 114.11 72.5
Carlos Hernandez 1603 139.326 159 114.12 62.3
Jorge Posada 10001 812.733 928 114.18 68.0
Eddie Perez 3094 255.505 292 114.28 67.8
Jason Phillips 1761 158.158 182 115.07 76.9
Raul Casanova 2246 195.517 226 115.59 69.9
Matt Walbeck 2967 254.938 297 116.50 66.0
Kirt Manwaring 2381 207.384 242 116.69 69.4
Josh Bard 2605 228.321 267 116.94 80.5
Bill Haselman 2665 218.490 256 117.17 71.9
Josh Paul 1679 143.282 168 117.25 72.0
Rick Wilkins 1529 126.147 149 118.12 61.1
Pat Borders 1750 151.915 182 119.80 67.0
John Flaherty 6417 541.677 649 119.81 67.3
Paul LoDuca 6597 564.847 684 121.09 67.3
Todd Hundley 4415 381.661 464 121.57 72.8
Chris Widger 3861 334.454 407 121.69 75.2
Scott Servais 3244 278.522 343 123.15 69.7
Darrin Fletcher 4843 406.006 508 125.12 74.4
Ed Taubensee 3371 287.210 363 126.39 79.3
Doug Mirabelli 3138 258.871 362 139.84 72.1
Mike Piazza 8634 758.514 1071 141.20 76.1
Chris Hoiles 2195 184.095 260 141.23 77.3
Lenny Webster 1897 157.755 227 143.89 71.8
Scott Hatteberg 2610 221.045 349 157.89 77.7

With Yadier Molina injured this Wednesday afternoon, it seemed like a good time to look at this data. Molina is pretty amazing. Not only does he completely shut down the running game, the few who try to run against him make it less than 50% of the time. Note that his potential replacement, Jason LaRue, does a great job at stopping the running game as well.

Note also that Ivan Rodriguez and Benito Santiago earned their reputations as great arms behind the plate. Joe Mauer joins Yadier Molina as the next generation of outstanding throwers.

At the other end, Mike Piazza certainly earned his reputation as someone who was challenged at stopping the running game, and Scott Hatteberg needed to become a first baseman for more than just the injury.

The real interesting catchers to me are Yadier’s brother Jose, Wiki Gonzalez, and Joe Girardi. They all posted very low stolen base percentages against, yet that didn’t stop runners from trying to steal against them. There appears to be more than the catcher’s arm at work here.

The next post concentrates combining pitchers and catchers. As always, I’m interested in your feedback. You can follow the series here.

Please consider supporting this work with a donation to the Baseball Musings Pledge Drive.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.

March 15, 2010

Probabilistic Model of Steals, Pitchers

Last week I introduced the idea of a Probabilistic Model of Steals. In that post, I looked at how the stage of the game, represented by the inning, changed the probability of attempting a steal, and the success rate on those attempts. Subsequent posts examined outs, the score difference and the combination of all three. This posts looks at running against pitchers, comparing actual stolen base attempts against the expected number of attempts based on the model.

To review, I’m only concerned with the pure steal situation, only a runner on first. I define score difference from the point of view of the offensive player (Offensive score – defensive score). In looking at the data, a lead of seven runs in either direction seemed to be the point where teams really stopped running, so any lead of seven runs or greater is either placed in the 7 or -7 bin. I also put all extra innings in the 10 bin for that category, since each extra inning starts the same, with the score tied. The data is from 1996 through 2008. The following table displays all pitchers with at least 1000 steal situations against them defined by the three parameters. The pitchers are ranked by the ratio of actual steals to expected steals, times 100. A ratio of 100 means that runners stole exactly as often as expected against the pitcher. A ratio over 100 means runners attempt steals more often than expected, under 100, less often. There are 107 pitchers in the study:

Pitcher Situations Expected Attempts Actual Attempts Ratio (100*Act/Exp) SB Pct
Terry Mulholland 1154 100.202 25 24.95 36.0
Eric Milton 1281 117.322 48 40.91 56.3
Kenny Rogers 2074 182.250 75 41.15 41.3
Glendon Rusch 1317 119.901 52 43.37 51.9
Randy Wolf 1242 112.470 49 43.57 59.2
Mark Redman 1075 95.037 42 44.19 40.5
Bartolo Colon 1781 154.946 71 45.82 47.9
Doug Davis 1378 123.680 59 47.70 47.5
Kirk Rueter 1490 130.046 63 48.44 33.3
Vicente Padilla 1100 98.365 50 50.83 62.0
Johan Santana 1213 101.715 53 52.11 58.5
Roy Oswalt 1275 110.900 58 52.30 55.2
Ron Villone 1006 82.247 46 55.93 43.5
Gil Meche 1115 102.910 60 58.30 50.0
Mark Buehrle 1572 135.266 80 59.14 41.3
Jon Garland 1421 128.122 79 61.66 49.4
Shawn Estes 1540 137.296 86 62.64 57.0
Carlos Zambrano 1175 101.824 64 62.85 46.9
Javier Vazquez 1793 163.019 103 63.18 66.0
Jimmy Haynes 1218 110.534 70 63.33 55.7
Matt Clement 1293 113.692 72 63.33 63.9
Mike Hampton 1785 154.497 98 63.43 40.8
Curt Schilling 1811 155.939 99 63.49 51.5
Chris Carpenter 1276 112.008 72 64.28 33.3
ChanHo Park 1645 148.123 99 66.84 49.5
Andy Pettitte 2231 190.596 128 67.16 59.4
Jon Lieber 1727 151.702 102 67.24 63.7
Brett Tomko 1442 129.909 92 70.82 68.5
Julian Tavarez 1055 86.615 62 71.58 69.4
Jarrod Washburn 1423 127.722 92 72.03 47.8
Brian Anderson 1080 95.656 69 72.13 44.9
Jason Marquis 1054 93.214 68 72.95 70.6
Shane Reynolds 1278 112.659 85 75.45 54.1
Wilson Alvarez 1025 89.904 68 75.64 47.1
Esteban Loaiza 1754 157.360 120 76.26 62.5
Barry Zito 1651 147.163 115 78.14 56.5
Matt Morris 1510 133.790 105 78.48 71.4
Dustin Hermanson 1022 88.987 72 80.91 62.5
Joel Pineiro 1008 87.280 71 81.35 70.4
Aaron Sele 1812 159.688 130 81.41 54.6
Jeff Weaver 1413 126.398 103 81.49 56.3
Jamie Moyer 2252 192.552 157 81.54 63.1
Livan Hernandez 2257 200.233 164 81.90 63.4
Ryan Dempster 1282 110.990 93 83.79 61.3
Woody Williams 1730 156.677 132 84.25 68.9
Jeff Suppan 1944 173.115 147 84.91 70.1
Kyle Lohse 1191 106.920 91 85.11 64.8
Darryl Kile 1302 114.318 100 87.48 65.0
John Burkett 1248 111.367 98 88.00 62.2
Ted Lilly 1115 98.401 88 89.43 70.5
Tom Glavine 2225 197.085 178 90.32 50.0
John Smoltz 1373 112.384 103 91.65 61.2
Jose Lima 1229 109.657 101 92.11 73.3
Denny Neagle 1115 98.259 91 92.61 63.7
Pat Hentgen 1264 110.896 104 93.78 58.7
Ramon Ortiz 1196 107.098 101 94.31 60.4
David Wells 1890 161.764 154 95.20 72.1
LaTroy Hawkins 1009 82.496 79 95.76 64.6
Kris Benson 1035 92.888 90 96.89 66.7
Brad Radke 1856 167.158 162 96.91 61.7
Pedro Astacio 1492 130.995 129 98.48 63.6
Paul Byrd 1414 125.932 125 99.26 68.8
Jason Schmidt 1732 156.500 156 99.68 72.4
Kevin Brown 1451 131.488 132 100.39 62.9
Mark Mulder 1117 98.075 99 100.94 48.5
Brian Moehler 1139 103.192 105 101.75 73.3
C.C. Sabathia 1380 120.751 123 101.86 59.3
Ismael Valdez 1372 125.449 128 102.03 69.5
Kip Wells 1119 102.588 107 104.30 67.3
Sidney Ponson 1498 133.750 140 104.67 65.0
James Baldwin 1154 100.898 107 106.05 66.4
Tim Hudson 1633 139.449 149 106.85 72.5
Odalis Perez 1144 106.697 115 107.78 65.2
Darren Oliver 1379 119.359 129 108.08 57.4
Rick Helling 1236 108.526 121 111.49 53.7
Ben Sheets 1144 105.657 120 113.58 75.8
Pedro Martinez 1744 151.320 172 113.67 74.4
John Lackey 1141 104.647 120 114.67 70.8
Kerry Wood 1063 91.551 105 114.69 64.8
Miguel Batista 1600 141.143 163 115.49 70.6
Mike Mussina 2129 188.919 219 115.92 63.9
Al Leiter 1731 153.348 178 116.08 59.0
Freddy Garcia 1436 127.352 148 116.21 80.4
Brad Penny 1245 111.997 131 116.97 74.8
John Thomson 1133 97.124 114 117.38 73.7
Jamey Wright 1508 138.030 163 118.09 67.5
Kevin Appier 1252 113.990 136 119.31 59.6
Russ Ortiz 1407 123.909 150 121.06 66.7
Scott Erickson 1194 105.938 129 121.77 72.9
Jeff Fassero 1275 109.263 135 123.56 56.3
Kevin Millwood 1746 157.319 196 124.59 80.1
Roy Halladay 1457 124.377 156 125.43 78.2
Chuck Finley 1257 110.961 143 128.87 60.8
Cory Lidle 1038 89.962 116 128.94 64.7
Dave Burba 1154 103.217 135 130.79 74.8
Steve Trachsel 1917 173.429 234 134.93 68.4
Derek Lowe 1635 134.644 182 135.17 78.0
Kelvim Escobar 1317 112.664 167 148.23 79.6
Andy Ashby 1073 95.442 145 151.92 69.0
Jason Johnson 1222 111.988 173 154.48 74.6
Brandon Webb 1072 96.995 151 155.68 76.8
Andy Benes 1051 90.679 142 156.60 67.6
A.J. Burnett 1215 104.858 165 157.36 73.3
Orlando Hernandez 1110 99.475 162 162.85 71.0
Roger Clemens 1912 164.859 271 164.38 73.4
Randy Johnson 2093 180.541 310 171.71 63.5
Greg Maddux 2099 191.016 359 187.94 76.6
Hideo Nomo 1526 139.598 268 191.98 73.1
Tim Wakefield 2093 186.345 369 198.02 78.9

At the top of the list are a number of pitchers one might expect, left-handers with good moves to first base like Mulholland and Rogers. I’m a bit surprised Andy Pettitte doesn’t rank higher, but I think it’s a combination of two things. Pettitte pitched in a lot more stolen base situations than many of the others at the top of the list, so the larger sample size might bring him back closer to average. The bigger factor, however, may be his high number of pickoffs. Most pickoffs are actually scored as caught stealings, since the runner often heads toward second base hoping for an error. Note the low success of runners against Pettitte.

Chris Carpenter is simply amazing at not only stopping the running game, but also preventing a stolen bag once a runner commits. Part of that is having Yadier Molina as a catcher, but Carpenter’s only had Molina behind the plate for part of his career. We’ll look into that relationship in more detail in a later post. I’m hoping this research eventually leads to a way to distinguish between catcher defense of the steal and pitcher defense of the steal.

At the other end of the scale, where runners attempt steals more often than expected against pitchers, lie three of the greatest hurlers of the period, Roger Clemens, Greg Maddux and Randy Johnson. These three appeared to pitch to a philosophy that the stolen base didn’t matter. They paid little attention to base runners, concentrating on getting the out at the plate. If the hitter makes an out, the chance of the runner scoring drops close to zero.

I’m not surprised the two easiest pitchers to steal on were knuckleballer Tim Wakefield and windup artist Hideo Nomo. The slow speed of Wakefield’s pitches combined with the difficulty of catching them makes Tim an easy target. Nomo’s back arch delivery allowed runners to get an extra step as well.

The next post concentrates on the catchers. As always, I’m interested in your feedback. You can follow the series here.

Please consider supporting this work with a donation to the Baseball Musings Pledge Drive.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.

March 10, 2010

Probabilistic Model of Steals, Base Runners

Monday I introduced the idea of a Probabilistic Model of Steals. In that post, I looked at how the stage of the game, represented by the inning, changed the probability of attempting a steal, and the success rate on those attempts. Subsequent posts examined outs, the score difference and the combination of all three. This posts looks at base runners actual stolen base attempts compared to their expected stolen base attempts based on the model.

To review, I’m only concerned with the pure steal situation, only a runner on first. I define score difference from the point of view of the offensive player (Offensive score – defensive score). In looking at the data, a lead of seven runs in either direction seemed to be the point where teams really stopped running, so any lead of seven runs or greater is either placed in the 7 or -7 bin. I also put all extra innings in the 10 bin for that category, since each extra inning starts the same, with the score tied. The data is from 1996 through 2008. The following table displays all players with at least 500 steal situations defined by the three parameters. The players are ranked by the ratio of actual steals to expected steals, times 100. A ratio of 100 means the player stole exactly as often as expected. A ratio over 100 means the player attempts a steal more often than expected, under 100, less often. There are 345 players in the study:

Runner Situations Expected Attempts Actual Attempts Ratio (100*Act/Exp) SB Pct
Tom Goodwin 1002 97.647 320 327.71 75.9
Jose Reyes 750 81.581 262 321.15 78.6
Scott Podsednik 750 76.785 246 320.38 73.2
Corey Patterson 667 59.971 184 306.81 77.2
Carl Crawford 925 90.390 268 296.49 81.0
Reggie Sanders 820 69.148 198 286.34 69.7
Chone Figgins 863 87.515 243 277.67 72.8
Dave Roberts 910 95.793 264 275.59 81.1
Juan Pierre 1592 165.175 445 269.41 71.0
Willy Taveras 619 63.959 171 267.36 81.3
Raul Mondesi 933 72.388 189 261.09 65.6
Eric Young 1404 145.539 376 258.35 72.9
Delino DeShields 791 75.139 194 258.19 77.8
Brian Hunter 764 77.665 200 257.52 78.5
Hanley Ramirez 506 51.379 132 256.91 75.8
Roger Cedeno 885 86.766 219 252.40 75.8
Ryan Freel 554 54.287 137 252.36 73.0
Mike Cameron 1316 113.007 284 251.31 79.2
Otis Nixon 591 63.168 157 248.54 80.9
Bobby Abreu 1696 143.936 341 236.91 73.3
Tony Womack 1299 138.954 329 236.77 80.9
Alfonso Soriano 998 95.517 226 236.61 78.3
Eric Owens 608 56.007 127 226.76 74.0
Juan Encarnacion 914 70.814 155 218.88 69.0
Rickey Henderson 1084 116.887 247 211.32 77.7
Ray Lankford 709 58.653 122 208.00 72.1
Preston Wilson 775 61.133 127 207.74 66.9
Darren Lewis 601 57.335 119 207.55 67.2
Damian Jackson 617 56.988 117 205.31 76.9
Al Martin 689 61.548 126 204.72 75.4
Alexis Rios 545 49.325 100 202.74 76.0
Luis Castillo 1968 200.748 404 201.25 70.0
Jimmy Rollins 1209 127.275 256 201.14 80.5
Rafael Furcal 1278 136.906 274 200.14 76.6
Coco Crisp 748 70.609 140 198.28 71.4
Edgar Renteria 1753 157.989 312 197.48 72.4
Torii Hunter 1005 77.498 153 197.42 65.4
Jerry Hairston 741 67.616 132 195.22 65.2
Pokey Reese 691 62.349 120 192.46 83.3
Devon White 568 54.070 103 190.49 73.8
Eric Byrnes 622 56.338 107 189.93 85.0
Marvin Benard 697 70.122 131 186.82 71.0
Adam Kennedy 940 81.251 151 185.84 70.2
Chuck Knoblauch 1130 121.127 223 184.10 76.7
Miguel Cairo 831 68.531 124 180.94 74.2
Gerald Williams 596 56.524 102 180.45 57.8
Lance Johnson 532 56.804 102 179.56 70.6
Quinton McCracken 672 58.877 105 178.34 61.9
Kevin Young 646 47.103 84 178.33 57.1
Brian Roberts 1038 108.245 189 174.60 75.7
Quilvio Veras 809 84.303 146 173.18 63.7
Gabe Kapler 640 49.834 86 172.57 73.3
Aaron Boone 836 62.384 107 171.52 81.3
David Wright 663 51.562 88 170.67 78.4
Todd Hollandsworth 638 55.170 94 170.38 66.0
Grady Sizemore 728 76.738 130 169.41 79.2
Johnny Damon 2123 223.268 377 168.86 76.9
Carlos Beltran 1360 120.407 202 167.76 87.1
Felipe Lopez 776 70.647 118 167.03 66.9
Andy Fox 501 44.007 72 163.61 70.8
Marquis Grissom 1033 96.308 155 160.94 65.8
Mark McLemore 1107 104.241 167 160.21 68.3
Stan Javier 581 51.231 82 160.06 81.7
Omar Vizquel 1789 167.064 267 159.82 67.4
Royce Clayton 1229 106.595 170 159.48 67.6
Ray Durham 1777 176.223 280 158.89 70.0
Milton Bradley 698 56.693 90 158.75 67.8
Michael Tucker 942 82.137 128 155.84 66.4
Kenny Lofton 1854 200.977 313 155.74 76.4
Cliff Floyd 1058 88.144 135 153.16 77.0
Endy Chavez 511 48.098 73 151.77 67.1
Randy Winn 1384 129.979 196 150.79 63.8
Cesar Izturis 765 71.909 108 150.19 63.9
Rickie Weeks 510 53.318 80 150.04 83.8
Julio Lugo 1131 106.704 160 149.95 70.0
Jeffrey Hammonds 571 45.872 68 148.24 64.7
Chad Curtis 543 48.573 72 148.23 66.7
Desi Relaford 705 57.545 85 147.71 75.3
Carl Everett 959 76.101 112 147.17 65.2
Doug Glanville 1051 107.133 157 146.55 80.9
Emil Brown 518 37.862 55 145.26 81.8
Ichiro Suzuki 1720 185.090 268 144.79 80.2
Richard Hidalgo 630 47.028 68 144.59 55.9
Matt Lawton 1328 120.670 174 144.19 68.4
Vladimir Guerrero 1420 114.477 165 144.13 64.2
Andruw Jones 1290 105.652 152 143.87 69.7
Marlon Anderson 683 53.165 76 142.95 69.7
Alex Rodriguez 1879 157.790 224 141.96 80.8
Dante Bichette 693 50.770 72 141.82 69.4
Shannon Stewart 1588 166.998 236 141.32 72.9
Brian McRae 548 52.358 73 139.42 60.3
Jose Cruz 1013 84.770 117 138.02 73.5
Brady Clark 641 54.601 75 137.36 58.7
Ron Gant 632 51.727 71 137.26 63.4
Jose Valentin 910 76.261 104 136.37 70.2
Brady Anderson 934 96.851 132 136.29 72.0
Corey Koskie 762 56.083 76 135.51 63.2
Shawn Green 1502 120.482 163 135.29 74.8
Eric Hinske 544 44.347 59 133.04 72.9
Carlos Lee 1118 82.069 109 132.82 71.6
Darin Erstad 1544 149.897 198 132.09 75.3
Gary Matthews 899 79.923 105 131.38 69.5
Barry Bonds 1340 113.794 148 130.06 79.1
Alex Gonzalez 848 74.044 95 128.30 63.2
Orlando Cabrera 1310 116.646 149 127.74 75.2
Dave Martinez 653 55.627 71 127.64 60.6
Cristian Guzman 1094 103.615 132 127.39 62.1
Matt Holliday 623 48.850 62 126.92 75.8
Rob Mackowiak 526 42.626 54 126.68 72.2
Jeff Bagwell 1345 114.807 144 125.43 63.9
Steve Finley 1359 117.528 146 124.23 68.5
Jose Canseco 507 38.789 46 118.59 67.4
Jose Offerman 1111 110.160 129 117.10 60.5
Barry Larkin 950 88.129 103 116.87 76.7
Andres Galarraga 696 54.181 63 116.28 50.8
Marty Cordova 658 48.031 55 114.51 50.9
Ellis Burks 814 69.062 79 114.39 70.9
Denny Hocking 540 43.067 49 113.78 53.1
Gary Sheffield 1632 138.935 158 113.72 68.4
Paul O’Neill 713 59.282 67 113.02 76.1
Fernando Tatis 577 44.666 50 111.94 68.0
Darren Bragg 587 51.998 58 111.54 63.8
Brad Fullmer 533 39.989 44 110.03 56.8
Jason Kendall 2067 189.237 208 109.92 67.3
Nomar Garciaparra 1195 102.078 112 109.72 75.0
Craig Counsell 1134 106.284 116 109.14 67.2
Chris Stynes 585 49.510 54 109.07 77.8
F.P. Santangelo 513 45.879 50 108.98 66.0
Aaron Rowand 741 61.491 67 108.96 70.1
Fernando Vina 1107 118.401 129 108.95 61.2
Carlos Guillen 960 80.930 88 108.74 60.2
Damion Easley 1066 91.230 99 108.52 69.7
Derrek Lee 1281 104.191 113 108.45 69.9
Mickey Morandini 744 67.331 73 108.42 72.6
Scott Brosius 590 47.096 51 108.29 60.8
Roberto Alomar 1203 115.023 124 107.80 81.5
Jeromy Burnitz 1094 82.591 89 107.76 55.1
Julio Franco 578 49.336 53 107.43 64.2
Brad Wilkerson 775 70.531 75 106.34 53.3
Rich Becker 508 43.324 46 106.18 73.9
Larry Walker 1103 91.649 97 105.84 74.2
Bobby Higginson 1097 92.287 97 105.11 62.9
Brian Jordan 834 68.801 72 104.65 70.8
Angel Berroa 561 46.969 49 104.32 69.4
Greg Vaughn 706 55.721 58 104.09 70.7
Derek Bell 643 59.773 62 103.73 66.1
Randy Velarde 599 55.588 57 102.54 68.4
Travis Lee 854 66.328 68 102.52 75.0
Derek Jeter 2353 231.526 237 102.36 75.1
Rondell White 1002 81.155 83 102.27 60.2
Chase Utley 687 60.360 61 101.06 85.2
Bret Boone 1152 94.439 95 100.59 62.1
Ivan Rodriguez 1365 116.575 117 100.36 73.5
Magglio Ordonez 1249 94.919 94 99.03 62.8
Scott Rolen 1328 102.439 101 98.60 72.3
Luis Alicea 633 57.909 57 98.43 59.6
Sammy Sosa 1178 96.948 95 97.99 60.0
Marcus Giles 748 76.575 75 97.94 70.7
Ryan Klesko 1153 91.536 89 97.23 70.8
Tadahito Iguchi 509 50.605 49 96.83 77.6
Adrian Beltre 1219 97.151 94 96.76 72.3
J.D. Drew 1156 100.366 97 96.65 76.3
Brandon Inge 710 60.587 58 95.73 56.9
Ty Wigginton 605 44.956 43 95.65 65.1
Jacque Jones 972 86.883 83 95.53 59.0
Melvin Mora 1153 105.811 100 94.51 60.0
Jay Payton 876 70.074 66 94.19 57.6
Mark Kotsay 1325 125.692 118 93.88 60.2
Jason Bay 664 50.231 47 93.57 85.1
Mike Lansing 614 57.749 54 93.51 68.5
Jeff Kent 1316 105.894 98 92.55 66.3
David Dellucci 714 58.676 54 92.03 57.4
David Eckstein 1302 139.241 128 91.93 71.1
Tony Batista 839 64.230 59 91.86 62.7
Tony Graffanino 756 65.483 60 91.63 66.7
Bernie Williams 1473 120.060 110 91.62 62.7
Orlando Palmeiro 721 62.513 57 91.18 50.9
Eric Karros 837 61.552 56 90.98 71.4
Mike Bordick 784 68.440 62 90.59 61.3
Darryl Hamilton 754 79.630 72 90.42 59.7
Todd Walker 1122 98.623 89 90.24 66.3
Mark Grudzielanek 1662 157.550 141 89.50 68.8
D’Angelo Jimenez 594 56.185 50 88.99 62.0
Juan Uribe 670 57.331 51 88.96 43.1
Craig Biggio 1970 205.658 182 88.50 76.4
Jamey Carroll 651 59.800 52 86.96 57.7
Casey Blake 739 62.255 53 85.13 54.7
Marlon Byrd 568 49.824 42 84.30 66.7
Jose Vizcaino 799 69.412 58 83.56 48.3
Morgan Ensberg 558 45.552 38 83.42 55.3
Vernon Wells 819 67.607 56 82.83 73.2
Neifi Perez 1176 105.430 87 82.52 57.5
David DeJesus 729 75.352 62 82.28 51.6
Brad Ausmus 1288 111.824 92 82.27 59.8
Albert Belle 598 45.692 37 80.98 70.3
Adam Dunn 901 68.418 55 80.39 74.5
Nick Johnson 551 44.800 36 80.36 55.6
Jose Hernandez 877 67.511 54 79.99 51.9
Jason Michaels 500 41.495 33 79.53 54.5
John Valentin 540 47.318 37 78.19 48.6
Chipper Jones 1758 151.050 117 77.46 76.9
Lance Berkman 1209 95.993 74 77.09 62.2
Brian Giles 1687 143.191 110 76.82 70.9
Greg Norton 534 35.283 27 76.52 44.4
Ricky Gutierrez 727 56.755 43 75.76 62.8
Abraham Nunez 659 56.032 42 74.96 66.7
Mark Bellhorn 552 46.938 35 74.57 71.4
Tony Phillips 606 64.286 46 71.56 52.2
Travis Fryman 712 54.662 39 71.35 64.1
Moises Alou 1177 89.838 64 71.24 75.0
Troy Glaus 1141 88.202 62 70.29 61.3
Orlando Hudson 711 61.344 43 70.10 67.4
Mark Ellis 658 60.434 42 69.50 73.8
Aaron Miles 557 49.660 34 68.47 61.8
Alex Cora 635 52.614 36 68.42 61.1
Garret Anderson 1525 118.672 81 68.26 56.8
Luis Gonzalez 1644 132.963 90 67.69 62.2
Jim Edmonds 1358 114.905 76 66.14 55.3
Jeff Conine 1178 89.646 59 65.81 66.1
Jack Wilson 907 80.359 51 63.47 52.9
B.J. Surhoff 1002 80.404 51 63.43 64.7
Jeff Cirillo 1238 104.076 66 63.42 54.5
Austin Kearns 616 46.056 29 62.97 65.5
Jay Bell 839 74.668 47 62.95 59.6
Joey Cora 501 54.068 34 62.88 58.8
Deivi Cruz 710 55.861 35 62.66 31.4
Terrence Long 659 56.121 35 62.37 65.7
Frank Catalanotto 958 88.345 55 62.26 63.6
Scott Spiezio 806 61.425 38 61.86 50.0
Jermaine Dye 1183 91.368 56 61.29 66.1
Michael Young 1277 117.907 71 60.22 74.6
Matt Williams 626 48.277 29 60.07 72.4
Ken Griffey 1316 111.965 67 59.84 68.7
Marco Scutaro 506 43.631 26 59.59 76.9
Dean Palmer 637 46.083 27 58.59 66.7
Reed Johnson 647 65.917 38 57.65 52.6
Gregg Zaun 686 52.106 30 57.57 56.7
Tim Salmon 1094 83.548 48 57.45 56.3
Raul Ibanez 1019 74.918 43 57.40 53.5
Rey Ordonez 604 51.275 29 56.56 51.7
Joe Randa 1231 96.486 54 55.97 59.3
Eric Chavez 955 77.456 43 55.52 69.8
Chris Gomez 860 65.059 36 55.33 44.4
Jose Guillen 1043 82.276 45 54.69 51.1
Matt Stairs 1029 76.209 41 53.80 58.5
Jason Varitek 966 74.464 40 53.72 60.0
Ronnie Belliard 1133 96.944 52 53.64 61.5
Trot Nixon 846 69.344 37 53.36 70.3
Rusty Greer 827 71.924 38 52.83 63.2
Placido Polanco 1322 126.600 66 52.13 65.2
Miguel Tejada 1462 117.958 61 51.71 62.3
Tony Gwynn 536 48.384 25 51.67 68.0
Rey Sanchez 852 73.565 38 51.65 63.2
Albert Pujols 1093 91.213 47 51.53 61.7
John Flaherty 521 36.890 19 51.50 26.3
Vinny Castilla 1054 79.919 41 51.30 48.8
Walt Weiss 586 54.592 28 51.29 75.0
Alex Gonzalez 730 62.888 31 49.29 54.8
Benito Santiago 551 40.748 20 49.08 45.0
Mike Sweeney 1092 86.567 42 48.52 59.5
Geoff Blum 690 55.770 27 48.41 44.4
Carlos Pena 631 45.524 22 48.33 59.1
Mark DeRosa 661 51.171 24 46.90 54.2
David Justice 721 56.738 26 45.82 46.2
Joe Mauer 552 48.121 22 45.72 81.8
Edgardo Alfonzo 1292 114.266 49 42.88 77.6
Wil Cordero 549 42.138 18 42.72 55.6
Jhonny Peralta 538 42.414 18 42.44 38.9
Brent Mayne 591 42.981 18 41.88 33.3
Rafael Palmeiro 1189 89.948 36 40.02 66.7
Aubrey Huff 924 67.916 27 39.75 51.9
Dmitri Young 987 76.401 30 39.27 40.0
Geoff Jenkins 939 76.964 30 38.98 73.3
Troy O’Leary 700 51.775 20 38.63 40.0
Fred McGriff 910 67.965 26 38.25 65.4
Mark Loretta 1621 149.443 57 38.14 59.6
Paul LoDuca 988 87.738 32 36.47 53.1
Todd Zeile 1046 82.776 30 36.24 56.7
Dan Wilson 714 58.157 21 36.11 66.7
Mike Lowell 1020 81.030 29 35.79 62.1
Shea Hillenbrand 696 53.433 19 35.56 52.6
David Bell 1015 83.875 29 34.58 48.3
Ben Grieve 714 52.435 18 34.33 83.3
Michael Barrett 728 55.448 19 34.27 47.4
Todd Helton 1546 121.017 41 33.88 46.3
Mark Grace 1070 84.733 28 33.04 35.7
J.T. Snow 1109 85.607 28 32.71 39.3
Ron Coomer 546 38.038 12 31.55 66.7
Aramis Ramirez 964 73.233 23 31.41 47.8
Hank Blalock 669 51.213 16 31.24 68.8
Robin Ventura 839 64.824 20 30.85 40.0
Pedro Feliz 570 45.729 14 30.62 57.1
Bill Mueller 1128 101.747 30 29.48 50.0
Manny Ramirez 1551 120.984 35 28.93 51.4
Kevin Mench 502 39.130 11 28.11 54.5
Richie Sexson 1003 71.685 20 27.90 45.0
Will Clark 550 43.010 12 27.90 58.3
Robert Fick 520 39.879 11 27.58 36.4
Juan Gonzalez 731 54.975 15 27.29 73.3
Sandy Alomar 552 41.382 11 26.58 36.4
Lyle Overbay 709 56.507 15 26.55 66.7
A.J. Pierzynski 764 61.733 16 25.92 31.3
Doug Mientkiewicz 798 65.713 17 25.87 29.4
Garrett Atkins 552 42.639 11 25.80 63.6
Rich Aurilia 1200 100.873 26 25.77 46.2
Hideki Matsui 746 55.470 14 25.24 71.4
Miguel Cabrera 797 63.621 16 25.15 50.0
Jose Vidro 1268 111.594 28 25.09 50.0
Justin Morneau 507 36.021 9 24.99 33.3
Edgar Martinez 1172 91.074 22 24.16 63.6
Jorge Posada 1190 95.190 23 24.16 43.5
Javy Lopez 982 74.245 17 22.90 29.4
Tino Martinez 1103 83.341 19 22.80 63.2
David Segui 745 57.416 13 22.64 38.5
Mike Matheny 661 53.426 12 22.46 25.0
Phil Nevin 838 62.892 14 22.26 71.4
Kevin Millar 940 69.807 15 21.49 46.7
Mark Teixeira 790 62.669 13 20.74 84.6
Mike Lieberthal 894 68.490 14 20.44 50.0
Frank Thomas 1358 113.386 23 20.28 69.6
Travis Hafner 586 46.185 9 19.49 44.4
Jay Gibbons 504 36.418 7 19.22 28.6
Mike Lamb 601 48.239 9 18.66 33.3
John Mabry 585 44.939 8 17.80 50.0
Tony Clark 846 61.855 11 17.78 36.4
Cal Ripken 664 52.065 8 15.37 12.5
Charles Johnson 726 59.970 9 15.01 33.3
Bengie Molina 725 58.517 8 13.67 12.5
Mo Vaughn 799 63.437 8 12.61 62.5
Freddy Sanchez 560 49.648 6 12.09 50.0
Sean Casey 1190 93.277 11 11.79 54.5
Brian Schneider 543 42.895 5 11.66 0.0
Jason Giambi 1500 119.698 13 10.86 61.5
Jim Thome 1612 129.198 14 10.84 35.7
David Ortiz 952 75.470 8 10.60 87.5
Mike Piazza 1165 96.778 10 10.33 30.0
Paul Konerko 1213 92.889 9 9.69 77.8
Ramon Hernandez 821 65.025 6 9.23 50.0
Scott Hatteberg 1028 86.382 7 8.10 14.3
John Olerud 1278 103.318 7 6.78 71.4
Carlos Delgado 1686 132.603 8 6.03 37.5
Damian Miller 638 51.136 3 5.87 33.3
Pat Burrell 951 74.024 4 5.40 75.0
Victor Martinez 581 44.644 2 4.48 0.0
Mark McGwire 605 51.381 2 3.89 100.0

That’s a pretty wide range of abilities. I especially like Jose Reyes and Carl Crawford at the top of the list, two players who run often, but also succeed often. Reggie Sanders and Raul Mondesi are less impressive, wasting a lot of their attempts with a caught stealing.

Derek Jeter comes out as one of the more average runners. His expectation was for 232 attempts, and he tried 237 times. His 75% steal rate is about break even in terms of runs for this era. Ivan Rodriguez, a catcher, comes closest hitting his expectation on the button.

At the other end of the scale are catchers, first basemen and designated hitters, with Mark McGwire bringing up the rear with only two steal attempts when 51 were expected. There are actually quite a few excellent hitters with a ratio under 30.

One thing I would like to open for discussion. My unit of measure for situations is the plate appearance. If the lead-off hitter reaches first and stays there through three more batters, that counts as three situations for the runner. I’m not sure that is correct. The outs situation changes, which means the strategy changes, so I think that right. Can anyone see a flaw in that argument? If the batter hits the first pitch, for example, was that really a opportunity for the runner to steal?

As always, I’m interested in your feedback. You can follow the series here.

Please consider supporting this work with a donation to the Baseball Musings Pledge Drive.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.

March 9, 2010

Probabilistic Model of Steals, Combined Parameters

Monday I introduced the idea of a Probabilistic Model of Steals. In that post, I looked at how the stage of the game, represented by the inning, changed the probability of attempting a steal, and the success rate on those attempts. Subsequent posts examined outs and the score difference. This posts looks at what happens when all three parameters are combined.

To review, I’m only concerned with the pure steal situation, only a runner on first. I define score difference from the point of view of the offensive player (Offensive score – defensive score). In looking at the data, a lead of seven runs in either direction seemed to be the point where teams really stopped running, so any lead of seven runs or greater is either placed in the 7 or -7 bin. I also put all extra innings in the 10 bin for that category, since each extra inning starts the same, with the score tied. The following table shows the probability of attempting a steal based all three parameters. Only combinations with at least 1500 steal situations are displayed:

Inning Outs Score Difference Situations SB Attempts Steals Prob. of an Attempt SB Pct.
1 0 0 14212 2416 1672 0.170 0.692
9 2 0 1680 248 183 0.148 0.738
8 2 2 1672 243 167 0.145 0.687
8 2 0 1935 279 194 0.144 0.695
3 2 -1 2639 376 253 0.142 0.673
9 1 0 1761 248 176 0.141 0.710
7 2 1 1958 273 189 0.139 0.692
7 2 2 1714 239 177 0.139 0.741
1 1 0 13735 1913 1295 0.139 0.677
7 1 2 1727 231 156 0.134 0.675
8 1 2 1648 220 154 0.133 0.700
10 2 0 2197 290 204 0.132 0.703
2 2 2 1715 225 146 0.131 0.649
8 2 1 1752 229 161 0.131 0.703
8 1 1 1722 223 149 0.130 0.668
3 2 0 4677 609 426 0.130 0.700
4 2 2 1878 239 158 0.127 0.661
5 2 1 2478 312 202 0.126 0.647
8 1 0 2039 254 175 0.125 0.689
7 1 1 2003 248 140 0.124 0.565
7 2 0 2135 260 175 0.122 0.673
3 2 1 3194 386 274 0.121 0.710
5 1 2 1839 223 143 0.121 0.641
3 1 0 4910 589 374 0.120 0.635
10 1 0 2552 305 216 0.120 0.708
6 2 1 2262 264 173 0.117 0.655
3 1 1 2960 347 217 0.117 0.625
5 2 2 1911 223 150 0.117 0.673
5 2 0 2784 326 227 0.117 0.696
6 2 2 1796 208 143 0.116 0.688
3 1 2 1907 220 145 0.115 0.659
5 1 1 2417 272 173 0.113 0.636
3 0 1 2385 270 182 0.113 0.674
6 2 0 2350 263 186 0.112 0.707
5 2 -1 2346 260 175 0.111 0.673
6 1 2 1752 194 118 0.111 0.608
7 2 -1 2099 227 172 0.108 0.758
4 1 2 1791 193 115 0.108 0.596
8 2 -1 1960 210 158 0.107 0.752
1 1 1 1734 183 122 0.106 0.667
6 1 1 2305 242 142 0.105 0.587
3 1 -1 2759 291 197 0.105 0.677
4 2 1 2722 284 187 0.104 0.658
5 1 0 2953 307 170 0.104 0.554
5 0 1 2022 211 131 0.104 0.621
3 2 -2 1617 167 123 0.103 0.737
5 0 2 1563 161 104 0.103 0.646
7 0 2 1505 154 100 0.102 0.649
2 2 1 2973 302 203 0.102 0.672
7 1 0 2241 227 132 0.101 0.581
1 2 0 12013 1219 871 0.101 0.715
6 2 -1 2179 220 147 0.101 0.668
8 0 1 1612 156 105 0.097 0.673
6 1 0 2471 238 140 0.096 0.588
8 0 2 1504 145 87 0.096 0.600
9 1 -1 1662 158 116 0.095 0.734
3 2 2 2066 196 136 0.095 0.694
4 1 0 3624 338 196 0.093 0.580
4 1 1 2794 259 151 0.093 0.583
4 2 0 3477 325 219 0.093 0.674
7 1 -1 2099 194 126 0.092 0.649
9 2 -1 1810 165 141 0.091 0.855
4 2 -1 2504 222 147 0.089 0.662
5 1 -1 2382 210 124 0.088 0.590
3 1 -2 1708 149 99 0.087 0.664
5 0 0 2567 224 137 0.087 0.612
6 1 -1 2322 202 124 0.087 0.614
5 2 -2 1805 156 108 0.086 0.692
2 1 1 2786 233 127 0.084 0.545
2 2 0 6841 575 378 0.084 0.657
1 2 1 2720 229 159 0.084 0.694
3 0 0 4504 376 227 0.083 0.604
4 1 -1 2735 227 143 0.083 0.630
6 0 1 1963 162 101 0.083 0.623
3 0 -1 2489 204 132 0.082 0.647
8 1 -1 1960 158 111 0.081 0.703
5 0 -1 2131 173 106 0.081 0.613
2 0 1 2195 178 98 0.081 0.551
6 0 0 2267 181 106 0.080 0.586
7 0 1 1782 143 89 0.080 0.622
4 0 0 3498 276 173 0.079 0.627
2 1 0 7574 588 324 0.078 0.551
2 1 -1 2785 212 115 0.076 0.542
6 0 2 1608 121 83 0.075 0.686
2 2 -1 2519 183 125 0.073 0.683
4 1 -2 1948 140 93 0.072 0.664
6 0 -1 2120 147 94 0.069 0.639
6 2 -2 1773 122 92 0.069 0.754
7 0 0 1945 134 81 0.069 0.604
4 0 -1 2538 167 117 0.066 0.701
8 0 0 1858 123 78 0.066 0.634
9 0 -1 1523 99 71 0.065 0.717
6 1 -2 1931 126 96 0.065 0.762
5 0 -2 1636 105 69 0.064 0.657
4 2 -2 1749 110 70 0.063 0.636
10 0 0 2578 162 110 0.063 0.679
9 0 0 1640 101 72 0.062 0.713
5 1 -2 1915 115 77 0.060 0.670
4 0 1 2348 137 87 0.058 0.635
2 0 -1 2512 142 79 0.057 0.556
2 0 0 6983 390 231 0.056 0.592
7 0 -1 1806 100 55 0.055 0.550
2 1 -2 1505 82 50 0.054 0.610
4 0 -2 1739 94 58 0.054 0.617
7 1 -2 1779 93 73 0.052 0.785
7 2 -2 1743 87 71 0.050 0.816
8 0 -1 1681 75 54 0.045 0.720
6 0 -2 1704 76 54 0.045 0.711
8 2 -2 1687 74 67 0.044 0.905
8 1 -2 1731 74 56 0.043 0.757
9 1 -2 1698 65 65 0.038 1.000
7 0 -2 1593 48 37 0.030 0.771
8 0 -2 1514 44 34 0.029 0.773
9 2 -2 1642 20 20 0.012 1.000

I’m not surprised that the mostly likely situation for attempting a steal is with the lead-off man on and the score tied in the first inning. That’s pretty much what lead-off men are designed to do, get on base and steal if they have the opportunity. Most of the other high steal attempt situations are late in the game with two outs and the score fairly close, but not trailing. In fact, in the top 34 rows, the only trailing situation in is in the third inning, with two out trailing by one. My guess is these are situations in which the lead-off man reached the second time around, and it’s worth him getting in scoring position for the big guns. The low attempt situations mostly come late in the game with teams trailing by at least two runs.

What impresses me here is that managers really do know how to use the stolen base. They tend to run when one run is important, or early on when their best thief reaches base. Looking at the list, there are very few situations in which I’d ask, “Why are you running often there?” The consensus manager, if you will, has a pretty good grasp of when to run.

The next installment examines which players run often, and which players cling to first.

As always, I’m interested in your feedback. The next installment will combine inning, outs and score difference into a complete model. You can follow the series here.

Please consider supporting this work with a donation to the Baseball Musings Pledge Drive.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.

March 9, 2010

Probabilistic Model of Steals, Score Difference

Monday I introduced the idea of a Probabilistic Model of Steals. In that post, I looked at how the stage of the game, represented by the inning, changed the probability of attempting a steal, and the success rate on those attempts. This posts examines the third parameter of interest, the score difference.

To review, I’m only concerned with the pure steal situation, only a runner on first. I define score difference from the point of view of the offensive player (Offensive score – defensive score). In looking at the data, a lead of seven runs in either direction seemed to be the point where teams really stopped running, so any lead of seven runs or greater is either placed in the 7 or -7 bin. The following table shows the probability of attempting a steal based on the score difference, and the success rate for each bin:

Score Difference Situations SB Attempts Steals Prob. of an Attempt SB Pct.
4 18521 2237 1549 0.121 0.692
3 27224 3172 2128 0.117 0.671
2 39416 4428 2909 0.112 0.657
0 126001 13784 9148 0.109 0.664
1 56503 5998 3883 0.106 0.647
-1 58328 5216 3519 0.089 0.675
5 12765 967 705 0.076 0.729
-2 42312 2518 1833 0.060 0.728
-3 29068 1354 1064 0.047 0.786
-4 19975 757 602 0.038 0.795
6 8567 262 194 0.031 0.740
-5 13762 394 309 0.029 0.784
-6 8921 180 156 0.020 0.867
-7 16025 146 128 0.009 0.877
7 14457 46 33 0.003 0.717

This table answers the question, “What lead is too big to steal?” The answer is teams with a five run lead stop running. What was counter intuitive to me is that the probability of a steal attempt goes up with bigger leads until that point. I suspected that -1, 0, 1 would be the three most probable scores for trying a steal, but instead, 4, 3, and 2 lead the way. Since leads of 3 and 4 runs result in a higher stolen base percentage than closer scores, I have to believe runners are taking advantage of lack defense. I suspect the conventional wisdom is that teams don’t run with a tree run lead, and that might give a good base stealer an edge if the pitcher doesn’t try to hold him close.

On the trailing end, the bigger the deficit, the less likely a team will try a steal. They will find success more often, as the leading team probably doesn’t defend against the steal. This is a great reason to do this research. If a catcher or pitcher is giving up steals with a big lead, he’s not really hurting his club. It’s also a great way for a runner to pad his stolen base statistics, running when way down.

As always, I’m interested in your feedback. The next installment will combine inning, outs and score difference into a complete model. You can follow the series here.

Please consider supporting this work with a donation to the Baseball Musings Pledge Drive.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.

March 8, 2010

Probabilistic Model of Steals, Outs

Earlier today I introduced the idea of a Probabilistic Model of Steals. In that post, I looked at how the stage of the game, represented by the inning, changed the probability of attempting a steal, and the success rate on those attempts. This posts examines the second parameter of interest, outs.

To review, I’m only concerned with the pure steal situation, only a runner on first. The following table shows the probability of attempting a steal based on the number of outs, and the success rate for each number of outs:

Outs Situations SB Attempts Steals Prob. of an Attempt SB Pct.
2 171554 15290 10823 0.089 0.708
1 171627 14928 9802 0.087 0.657
0 148664 11241 7535 0.076 0.670

The farther along in an inning in terms of outs, the more likely a runner will steal. This makes sense, as getting a runner in scoring position with two out means he’ll likely score on a single. Note two that runners succeed more in this situation as well, as getting the batter becomes the primary task of the defense. Even if the runner reaches second base, the offensive team only has one out to spend to get him home. It appears teams are much more willing to defend against a steal in a one-out situation, as shown by the difference in success rates.

Teams seem perfectly willing to wait to see if a situation develops with a man on first and no one out. The idea that a runner caught stealing harms the chances for a big inning are reflected in this number. Note how the SBPct lies between the other two outs situations. Teams don’t want to run themselves out of a big inning, so they don’t run as often and they try to run when they have a better chance of success.

As always, I’m interested in your feedback. The next installment will look at the score difference.

Please consider supporting this work with a donation to the Baseball Musings Pledge Drive.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.

March 8, 2010

Probabilistic Model of Steals, Innings

The MIT Sloan Sports Analytics Conference gets me thinking about sabermetrics, and the one big thing that the conference brought out was the need for better ways to analyze catchers. Since I like probabilistic models, I thought I would try to apply them to stolen bases. The idea is to get an idea of which catchers are good at stopping the running game, but in the context of the situation, the base runner and the pitcher.

It’s actually tough to come up with a name for this series. Probabilistic Model of Steals works out to PMS. Steal Models abbreviates SM. I’m going with Probabilistic Model of Steals, and I’ll try to avoid the acronym.

I’m going to start by focusing on one particular situation, man on first base only. Double steals and steals of second with men on first and third are more strategic than simply runner versus defense.

The data is from Retrosheet, 1996-2008. I haven’t downloaded the 2009 data yet. I start in 1996, since that represents the fourth year of the latest offensive era. Managers should have adjusted their steal strategies by then.

My idea is to look at both the probability of an attempt, and the probability of success given an attempt based on a number of parameters, the inning, the outs, and the score difference. The score difference will be from the view of the offensive team. I’m going to start with the innings. I group all extra innings into bin 10, since all extra innings basically start with the same situation, the score tied. The score difference will be from the view of the offensive team.

The following table shows the probability of attempting a steal based on the inning, and the success rate for each inning:

Inning Situations SB Attempts Steals Prob. of an Attempt SB Pct.
1 56746 7347 5132 0.129 0.699
3 54200 5352 3607 0.099 0.674
10 9149 899 645 0.098 0.717
5 54391 4671 3110 0.086 0.666
4 55092 4488 2909 0.081 0.648
6 55599 4278 2869 0.077 0.671
2 54103 4155 2557 0.077 0.615
7 55178 4102 2846 0.074 0.694
8 55609 3764 2688 0.068 0.714
9 41778 2403 1797 0.058 0.748

I notice two forces at work here on the probability of an attempt, the closeness of the game and the construction of the lineup. Baseball managers like to put players who can steal at the top of the lineup. They’re more likely to get on base in the first inning. The middle of the order, slower, power hitters, tend to bat in the second inning, so the difference between running in the first and running in the second is huge. The probability comes back up in the third inning, when the top of the lineup gets a chance to bat again. By the sixth inning, lineups are random enough that attempts just fall off from there.

I believe the fall off in later innings comes from the game becoming more decided. If one team is up 3 or 4 runs late, there’s not a big incentive to steal. The out becomes too costly. That’s also reflected in the SB Pct. numbers, with more success in the late innings. Part of that may be the defense not caring as much; they’ll give up a base rather than throw the ball into centerfield. They don’t hold the runner, as preventing another hit becomes more important than stopping a steal.

Things change in extra innings, however, as one-run strategies come to the fore again, like in the first inning when the game is also close. The inning offers the third highest probability of a steal if a man is on first. The stolen base percentage is higher, too, as a failure to succeed here is much more costly than a failure in the early innings. A team that fails to score may not get another chance to win the game.

As always, I’m interested in your feedback. The next installment will look at outs.

Please consider supporting this work with a donation to the Baseball Musings Pledge Drive.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at 20 Sunset Rd., Newark, DE 19711.