Category Archives: Statistics

August 5, 2025

The Strikeout Pitch

It’s the change of speed from fast to slow that tends to get batters striking out:

I like to think about this in terms of pattern recognition errors. The pitchers train the batter’s recognizer on a fastball, then disrupt the recognizer with the changeup.

July 13, 2025

In the Zone

Here is an excellent thread on the physics of heat maps:

June 12, 2025

Situational Fielding

I saw this via X this morning, a study of situational fielding. It starts out of an example of a batter getting a hit to the outfield, but one fielder dives and the ball goes by him, the other plays it on a hop and holds the runners to one base:

In the eyes of OAA these plays are graded the same: a play can either be a hit or an out, and neither was an out, but we know that one fielder hurt his team while the other helped. We know this because we can play these little counterfactual thought experiments about two fielders in two worlds.


This is all about a fielding stat which puts a number to those little thought experiments. It compares what the player did to all the things he could’ve done, in not just 2 worlds but all the possible worlds, and grades him on how much better or worse he did than expected.

https://situational-fielding.streamlit.app/

When I say Juan Soto at the bottom of the list I thought, “Yes, he does the dive and miss.” Pete Crow-Armstrong is at the top. Check it out.

June 6, 2025 March 14, 2025

Happy Pi Day!

It’s March 14th, I’m living in Apartment 314, and this afternoon I’m eating pie.

Four pitchers since 1920 posted Pi ERAs in a season, minimum of 25 innings:

  • Lon Warneke, 1940, 3.142241379
  • Dizzy Trout, 1945, 3.142083897
  • Jerry Koosman, 1970, 3.141509433
  • Mike Hampton, 2000, 3.142419601

I’m very happy Koosman gets the most digits of Pi right, as he was one of my favorites growing up.

Eight hitters in that era managed 1/10 of Pi, minimum 100 AB:

  • Lloyd Waner, 1931, 0.314243759
  • Augie Galen, 1935, 0.314241486
  • Luke Appling, 1941, 0.314189189
  • Bob Horner, 1979, 0.314168377
  • Carlos Baerga, 1995, 0.314183123
  • Jim Thome, 1995, 0.314159292
  • Juan Gonzalez, 1996, 0.314232902
  • Carlos Lee, 2008, 0.314220183

Thome’s 1995 season gets seven digits correct, making him the king of Pi! He went 142/452 in that strike shortened season.

December 27, 2024

Understanding Pitch Values

Tom Tango posts a very nice explanation of the measure of a particular pitch. He breaks out wOBA, which only gets calculated on the final pitch, versus the run value, which is calculated on all pitches of that type. He gives the example of Yusei Kikuchi and his four seam fastball. Overall, the pitch is a net positive for Kikuchi, but on balls in play, it is well below average.

This brings up a larger problem in measuring pitch effectiveness. We are not in a One-Punch Man Universe:

Each separate attack must be delivered with intent to kill.

That’s not how a plate appearance works. Pitchers and catcher have a plan for each hitter. In most cases, the first three pitches likely are not designed to defeat the batter, but to put the batter in a situation where he can be defeated. In many cases, the fastball is used to prime the batter to be defeated by an off-speed pitch.

So when we measure wOBA on pitches, we don’t know the intent of the pitch. I suspect the data is telling us that Kikuchi does a good job using his fastball to set up batters. Some batters are not going to go for the setup, however, and recognize the fastball in that situation and put it in play to good results.

Note that it’s difficult to infer intent. Even on a 3-2 pitch, if a base is open, the pitcher may be willing to sacrifice a walk rather than give a batter a pitch in the strike zone. It’s possible that a pitch with a low run value might in fact be valuable to in the chain of pitches that leads to an out.

December 24, 2024

Stanton’s Swing

Tom Tango analyzes Giancarlo Stanton swing, charting acceleration and bat speed on contact and misses. He gives us an idea of just how much force Stanton generates:

Math interlude: We can see he gains just over 3 mph (3.2) per frame, or 3.2 mph per 3.33 msec of acceleration. 3.2 mph is 4.7 feet per second. Therefore 4.7 ft/s per 3.33 msec is 1410 feet per second-squared. Gravity is 32.174 feet per second-squared, or one g-force. Giancarlo is 1410 / 32.174 = 44. That’s how much acceleration Giancarlo generates with his swings: 44 g-force.

TangoTiger.com

It makes you feel sorry for the ball.

December 9, 2024

ERA Adjusted WAR

Via Tom Tango’s Twitter feed:

The site is Era Adjusted Baseball Stats, and the link goes to the top 100 adjusted career WAR.

These era-adjusted statistics are obtained from Full House Modeling. This model computes era-adjusted statistics through a principled balancing of how well a player performed “vs their peers” and the size of the MLB talent pool. Under this model, great all-time statistics requires that an MLB player is both better than their peers and played during a time in which the talent pool is large. In this way, the model constructs an even playing field that extends across eras.

eckeraadjustment.web.illinois.edu

It’s a fascinating list, with Barry Bonds, Roger Clemens, Willie Mays, Babe Ruth, and Henry Aaron the top five. More recent players tend to rank higher than we usually see. The second and third ranked pitchers, for example, are Greg Maddux 9th) and Randy Johnson (11th). Bert Blyleven ranks 22nd, showing once again that the campaign to get him elected to the Hall of Fame was worth it.

Johnny Bench is the highest ranked catcher (65th). I saw someone complaining the other day that Bench was not the greatest catcher of all time, but this list seems to confirm he was.

One of my pet peeves in previous Hall of Fame voting was that both Graig Nettles and Willie Randolph fell off after the first ballot, while I didn’t believe either of them were Hall of Famers, I did think they deserved enough votes to stay on the ballot. This analysis ranks them 92nd and 100th. Maybe a veteran’s committee should give them another look. Another sabermetric favorite, Bobby Grich, ranks 82nd.

Have fun with the site!

November 22, 2024

Carter and RBI

Exploring Baseball Data with R examines Joe Carter’s RBI totals during his full seasons from 1985 to 1998. There is a very nice visualization of RBI opportunities versus other players in the league, Carter being near the top in most of his seasons. He always seemed to wind up batting behind some players who were very good at getting on base.

The Day by Day Database calculates RBI percentage, which does something very similar. In that era, for hitters with at least 4500 runners on base, Carter is middle of the pack in percent of those runners driven in. The player from those years who might have been underrated in the ability to drive in runners was Kirby Puckett. He drove in 18.7 percent if his base runners, compared to 16.8% for Carter.

In the last 14 years, very few players saw 4500 baserunners in front of them. Nolan Arenado drove in 17.5% at the high end, and Carlos Santana, who saw the most base runners, drove in 13.6%.

November 10, 2024

Relief Changes

The use of relief pitchers changed greatly versus twenty years ago. The trend appears to be more pitchers throwing fewer innings. This study compares two groups in two eras.

The groups are pitchers with at least 60 IP in a season and pitchers with between 30 and 59 2/3 innings in a season. This is a proxy for top tier versus second tier relievers. The eras are 2001-2004 versus 2021-2024. I wanted the same number of teams in the study (30), and going from 2001-2004 gave the two 1998 expansion teams a chance to settle in.

First, a look at how the 60 inning pitchers changed.

60+ Innings in Relief2001-20042021-2024
Pitcher Seasons380335
Innings27956 1/322516 2/3
K per 97.579.60
BB per 93.353.18
HR per 200 IP19.319.6
FIP3.793.30
Relievers with at least 60 IP in a Season

There were 45 fewer pitcher seasons in last four years than 20 years ago, and about 440 fewer innings or about ten innings per pitcher season. The rise in strikeout rate matched the overall rise in strikeouts of 2 per 9 IP, but these pitchers remained well above the league averages. K per 9 went from 6.57 to 8.69 during the period. Walk dropped from 3.33 to 3.21 for the league, so elite relievers don’t get that way by avoiding free passes.

What really stands out for these pitchers is the home runs. Even though HR per 200 IP rose slightly for this group, there was a huge jump for the league going from 24.4 per 200 IP to 26.2 per 200 IP! They only showed a slight rise in a much higher HR environment. Part of that comes from pitchers no longer batting, but part comes from better launch angles making up for the lack of performance enhancing drugs.

So FIP is better, down 0.5 runs per game. Overall, league scoring is down 0.3 per team game, from 4.73 R/G to 4.46 R/G.

How did the second tier relievers fare?

30 to 59 2/3 Innings in Relief2001-20042021-2024
Pitcher Seasons435683
Innings19399 1/331132 2/3
K per 97.129.26
BB per 93.803.63
HR per 200 IP22.823.0
FIP4.273.75
Relievers with between 30 and 59 2/3 IP in a Season

Notice that teams did not simply take some innings away from the top tier pitchers and move 45 seasons into this group. They added 200 pitcher seasons on top of that. They added all those pitcher seasons and got the same improvement in performance as they did for the top tier. The average 30-60 IP reliever threw 44.6 innings in the early part of the century, 45.9 innings the last four years. Except for walks, they are better than pitching overall.

It’s the emergence of this group as a force that allowed teams to enforce the two times through the lineup heuristic for staring pitchers. To a first approximation, no one gets tired anymore.

November 3, 2024

Young Slugger

David Laurila at FanGraphs speaks with Bryce Eldridge, a young hitting prospect for the Giants. He talks about being a slugger and a complete hitter:

“Growing up, I kind of had a natural, sweet lefty swing,” he told me. “As I kept getting bigger I had to keep working at it, working with my coaches back home, working on trying to stay short. Consistency is the biggest key, especially for a guy my size. The more I can stay consistent with that stroke, the better it will be.”

And if you’re wondering, yes, he considers himself a power hitter.

“For sure,” Eldridge replied when asked if he identifies as such. “I think I’m a pretty complete hitter, although the strikeout numbers [132 this season] might point to me being a power hitter. Being able to put the ball in the air is my game. If I can get it in the air, it flies.”

FanGraphs.com

The two most interesting young sluggers in the majors right now are the juniors, Bobby Witt Jr.and Vladimir Guerrero Jr. Both slugged over .500 in 2024 without striking out a ton. On top of that, they posted high BABIPs, which led to high batting averages. A high BABIP with a lot K rate and a decent number of home runs can lead to an extremely high batting average. Here is the list of players who hit at least .370 (technically, .3695) since 1941:

SeasonFirst NameLast NameABBABIPBABA to BABIP RatioHRKsHR to K Ratio
1941TedWilliams4560.3780.4061.0737271.4
1994TonyGwynn4190.3890.3941.0112190.6
1980GeorgeBrett4490.3680.3901.0624221.1
1957TedWilliams4200.3670.3881.0638430.9
1977RodCarew6160.4080.3880.9514550.3
1999LarryWalker4380.3630.3791.0437520.7
1948StanMusial6110.3540.3761.0639331.2
2000ToddHelton5800.3570.3721.0442610.7
2000NomarGarciaparra5290.3780.3720.9821500.4
2004IchiroSuzuki7040.3990.3720.938630.1
1997TonyGwynn5920.3630.3721.0217280.6
1993AndresGalarraga4700.3990.3700.9322730.3
1987TonyGwynn5890.3830.3700.977350.2
2002BarryBonds4030.3300.3701.1246471.0
.370 BA since 1941

Ted Williams in 1941 was pretty amazing.

Note that hitters with a HR to K ratio over 0.6 then raise their BA over their BABIP. Witt and Guerrero both had a ratio of 0.3 in 2024. They each may be able to add ten home runs next season, but they are going to need to pull their Ks down quite a somewhere less than 70, to get the BA boost.

Also, I didn’t realize how strikeouts really hurt Galarraga in 1993, and how a lack of home runs hurt Ichiro in 2004 and Carew in 1997. All three had BABIPs near .400, but could not pull their batting averages near or over those BABIPs.

October 29, 2024

Adjusting Our Priors

I had a discussion with Claude.AI today. I asked the following question:

Can you explain Bayesian updating. If I have two teams that I believe are evenly matched, with a prior probability of 0.5, how do I update that prior if one team wins three games in a row against the other?

Here is Claude’s response:

Let me walk you through Bayesian updating in this context.

Let’s break this down step by step:

  1. Initial Prior:
  • P(Team A is better) = 0.5
  • P(Team A is not better) = 0.5
  1. We need to consider:
  • The likelihood of seeing 3 consecutive wins IF Team A is truly better
  • The likelihood of seeing 3 consecutive wins IF teams are actually equal

Let’s make some reasonable assumptions:

  • If Team A is truly better, they might win any given game with 65% probability
  • If teams are equal, each has 50% probability of winning any game
  1. Calculating likelihoods:
  • P(3 wins | Team A is better) = 0.65 × 0.65 × 0.65 = 0.274
  • P(3 wins | Teams equal) = 0.5 × 0.5 × 0.5 = 0.125
  1. Using Bayes’ Theorem:
    P(A is better | 3 wins) = P(3 wins | A is better) × P(A is better) / P(3 wins)

Where P(3 wins) = P(3 wins | A is better) × P(A is better) + P(3 wins | Teams equal) × P(Teams equal)
= 0.274 × 0.5 + 0.125 × 0.5 = 0.200

Therefore:
P(A is better | 3 wins) = 0.274 × 0.5 / 0.200 = 0.685

So after three consecutive wins, we would update our belief from 50% to about 68.5% that Team A is the better team.

Would you like me to explain how the probabilities would change with different assumptions about how much better one team might be?

I thought the 0.65 probability was too high. So I asked for the analysis based on a .55 probablility:

I’ll recalculate using 55% as the probability of Team A winning each game if they’re truly better.

  1. Initial Prior remains:
  • P(Team A is better) = 0.5
  • P(Team A is not better) = 0.5
  1. New probabilities:
  • If Team A is better: 55% chance to win each game
  • If teams are equal: 50% chance to win each game
  1. New likelihood calculations:
  • P(3 wins | Team A is better) = 0.55 × 0.55 × 0.55 = 0.166
  • P(3 wins | Teams equal) = 0.5 × 0.5 × 0.5 = 0.125
  1. Using Bayes’ Theorem:
    P(3 wins) = 0.166 × 0.5 + 0.125 × 0.5 = 0.146

Therefore:
P(A is better | 3 wins) = 0.166 × 0.5 / 0.146 = 0.569

With this more conservative assumption, three consecutive wins would only move our belief from 50% to about 57% that Team A is better. This makes intuitive sense – if we believe the skill gap between teams is smaller (55% vs 65% win probability), then we need more evidence (more games) to be confident that one team is actually superior.

This illustrates an important principle of Bayesian updating: the strength of our update depends not just on the evidence (3 wins) but also on how different we expect the world to look under our competing hypotheses. When the hypotheses make very different predictions (65% vs 50%), evidence can shift our beliefs more dramatically than when the hypotheses make similar predictions (55% vs 50%).

I like that a lot better. A probability of .569 indicates that a team that goes up 3-0 in a series would win the series 96.5% of the time. That’s very close to the .975 actual series win probability for teams going up 3-0 in MLB history. If teams are evenly matched we would expect them to win 93.75% of the time. So there is some evidence that teams that go up 3-0 in a seven game series are a bit better than their opponents, but not my much.

August 26, 2024

As Heard on TV

I have heard broadcast play by play announcers discussing on-base streaks. As you might imagine, it is much easier to earn your way on base than just by a hit. This list is a bit out of date, but the top remains the same. Twice hitters exceeded 70 games in a season. Joe DiMaggio set a record of 74 in 1941 as part of his hitting streak. Ted Williams blew that away in 1949 with an 84 game streak. I don’t think many people were award of the records at the time, as hits were paramount. Drawing a lot of walks can be detrimental to a hit streak, but they are great if you are trying to generate an on base streak.

The second thing I heard was in a recent Yankees broadcast, where they talked about Juan Soto providing protection for Aaron Judge by being on base so many times in front of him. I actually did a graphic about this at ESPN in 1992. George Bell joined the White Sox at season, near the end of his career, and he was driving in a lot of runs. Robin Ventura and Frank Thomas were batting in front of him, clogging up the bases. Bell saw the most runners on base that season, so opponents could not pitch around him. Bell was still good at generating power, and that led to lots of RBI. The takeaway is that there is more than one way to protect a power hitter.

August 18, 2024

Ranking Skenes

Back on July 30, 2024, Tom Tango and I disagreed over where Paul Skenes ranked among starting pitchers. Tom put him seventh, the Bill James Ranking put him 89th.

The James ranking is more conservative, and I believe rightly so. How can you rank a pitcher or batter among the very best after just a couple of months of duty? Skenes’s rise was indeed impressive, but the league did not have a chance to adjust to the youngster.

Halfway through August, the meteoric rise has stalled. Today he ranks 80th. His last three starts don’t look all that legendary, good, but not great. He produced a 4.15 ERA in 17 1/3 innings, walking eight batters. He struck out batters at a good rate, but not exceptionally good.

The James ranking found a level that Skenes could sustain. I suspect the young star will adjust himself, and keep rising. For now, he appears to be where he belongs.

August 7, 2024

Walk Man

With two out in the bottom of the eighth inning, none on, in game one of the doubleheader between the Angels and the Yankess, LAnaheim manager Ron Washing issued an intentional walk to Aaron Judge. Here’s the data since 1974 on what base situations lead to an intentional walk:

Base SituationIntentional Walks
000140
001134
01027655
011233
1007818
1011587
11020893
1113
1974 through August 6, 2024

The base situation corresponds to the three bases from the catcher’s perspective, with third base on the left and first base on the right. A one indicates the base was occupied.

So the IBB with the bases empty is on par with a man on first, but Judge’s recent collection of bases empty walks brought that situation a little farther ahead.

July 5, 2024

Complete Steele

Justin Steele of the Cubs pitched a complete game two-hitter Friday afternoon to help defeat the Angels 5-1. He walked two and struck out seven.

That brings the complete game total for the season to 17. I thought that might be pretty good, but the CG rate is down again. The seventeen complete games came in 2621 starts, or in 0.64% of starts. Last season produced 35 CG, which worked out to 0.72%. That was down from 36 CG in 2022. CGs may disappear from the game faster than triples.

July 4, 2024

Hitting Your BABIP

Steven Kwan of the Guardians went three four four Thursday afternoon to raise his batting average to .367. He also raised his BABIP to .379. One of those hits went for his eighth home run of the season. He hit eleven in total during his first two year in the majors.

Kwan right now provides a good example of how batting average is a combination of BABIP and what I call BABnIP, batting average on balls not in play. BABIP is approximately your BA if you never homer and never strike out (not quite, because BABIP includes sacrifice flies in the denominator). How runs raise your BA above your BABIP, strikeouts know your BA down. BABnIP is simply HR/(HR+K). What a batter should want for a higher BA is a BABnIP is higher than his BABIP.

Kwan struck out 21 times this season, giving him a BABnIP of 8/29, or .276.

Note that in 1941, Ted Williams owned a BABIP of .378. That season, he hit 37 home runs and struck out 27 times for BABnIP of .578. Power and contract is a rare skill, and it’s the big reason he was the last .400 hitter.

June 24, 2024

Best Team in the AL?

We have an odd situation in the AL. The Guardians own the best winning percentage at .653, but they are half a game behind the Yankees at .650.

I love these situations as it provides a moment to explain an important point in judging the standing of teams. First place is based on winning percentage, not on games back. Games back is a measure of the difference in games over .500. In almost all cases they agree on the ranking.

If you look at winning percentage as a long-term outcome, the Guardians would win 106 games at .653, the Yankees would win 105 games at .650. That’s why the Cleveland should be considered as having the best record in the AL.

Games back is 1/2*(Difference in games over .500). It’s a chase number. Right now, the Yankees are 24 games over .500 (52-28) while the Guardians are 23 games over .500 (49-26). 1/2(24-23) = 1/2. This came about because the Yankees played five more game that Cleveland. If the Guardians just made up the five games before the Yankees played again, we’d expect them to go 3-2 (.600) , gaining three half games for each win and lowing two half games for each loss, and the teams would be tied.

June 22, 2024

The Long Road to 50

Andrew McCutchen hit the 50th triple of his career as his Pirates beat the Rays 4-3 on Saturday. For McCutchen it was a long wait. His 49th triple came on 4/7/2021. In between he played 446 games with 75 doubles, 66 homer runs, but no triples. Congratulations to McCutchen on holding on long enough to reach that milestone. It turns out that since 1920, not too many players hit 400 doubles, 50 triples, and 300 home runs. While there are a lot of Hall of Famers on the list McCutchen is closer to the ones who would be in the Ha

June 1, 2024

Swing Acceleration

Tom Tango posted this graph on X:

I really like how this captures the difference between Jose Altuve and Luis Arraez. Both keep their bats in the strike zone a long time (slow late acceleration). Altuve, however, gets the bat moving quickly at the beginning, and it looks like that allows him to generate more power.

May 13, 2024

Holy Bat Speed!

MLB details new StatCast technology that measures bat speed. One of my favorite graphics here is the list of fastest and slowest average swing speed, and how both sets are full of good hitters:

Oh, no: Another reason that Statcast doesn’t respect Arraez, right? Not quite. Slower doesn’t always mean worse; sometimes it just means different, in the same way a pitcher with a below-average fastball might still find a way to succeed.

So why does any of this matter? After all, it’s not like very many people would prefer Stanton over Steven Kwan in their lineup, right? It’s sort of like fastball velocity, where you’d obviously prefer to have it than not have it — and in the aggregate, it’s much better to have it — but throwing 98 mph doesn’t automatically make you more productive than someone who throws 92. It’s a tool, not the answer.

Still, the data makes it pretty clear. If you’re not an outlier unicorn in terms of bat control like Kwan or Arraez, you’d probably like to swing hard if you could.

MLB.com

There are hitters who are known for keeping the bat in the zone longer, and they tend up to contact hitters. They start their hips to get their body moving, but they commit their hands very late. This allows them time to correct their swing if they misjudged the pitch. I’ve heard Tony Gwynn and Derek Jeter described this way, but I suspect hitters like Ty Cobb, Joe DiMaggio, Pete Rose, Rod Carew, and Wade Boggs fit the mold as as well.

If you look at this graph at Baseball Savant, what you see is that while Arraez’s swing is slow, he squares up the ball better than anyone. The graph also makes clear why Juan Soto is go good, as he combines bat speed with the ability to square up the ball.

It will be nice to see what that graph looks like when we have a couple of thousand of swings for each batter.

April 17, 2024

Two Trends

In looking at MLB batter splits for this season, two things stand out to me. The first is that the composite batting order is rather well constructed. The major put a lot of on-base percentage at the top, and given the stats so far, the fourth hitter is better than the third hitter in both OBP and slugging percentage. That makes sense, as the fourth hitter serves a dual role of cleaning the table, and setting the table for the rest of the order. After the fourth hitter, there is a steady decline in OBP and slugging. The worst hitters still bat ninth, and they are pretty awful this season.

The other thing that stands out is that designated hitters are the best hitters by position. That really hasn’t been true during the history of the DH, even though the defensive spectrum would predict that. It’s possible that with the universal DH, teams and players finally adopted the idea that it’s a real position, rather than a place holder for someone injured, or someone who needs a day off from the field.

April 15, 2024

Lots of Walks

It struck me that the six runs the Padres scored on fourteen walks Sunday night against the Dodgers seemed low. Looking at all games from 2021 on, the linear regression provides the formula runs = (0.55 * Walks) + 2.73. So normally we would expect 10 runs scored in such a game. For the most part, that has not been the case. In that time there were four games with 13 or 14 walks, and the runs scored in those games were 4. 5, 6, and 7. There was also a 17 walk game that led to 17 runs. This makes me wonder if sometimes high walk games are more about the defensive use of the outcome, walking tough hitters to reach one who are easier to put out.

April 2, 2024

Five Shutout Days

Major League Games produced five shutouts on Monday. Days with lots of shutouts do seem to be connected to weather. In the 30 team era (1998-on), Sept-Oct produced 22 days with at least five shutouts, March-April 18, with May third at 17. Of the summer months, July produced 14 such days, August 12, and June 10.

March 19, 2024

Losing Parks

Cybermetrics points to a Bill James article that explores the idea that teams that play in parks that favor the pitcher are better than teams than play in parks that favor the batter. A quick synopsis is that the more your park favors hitters, the worse the team plays on the road. As with most Bill James work, the article is well worth your time.

This begs the question, how do you avoid extreme parks? Do you not allow teams to play at altitude? Mandate 30 foot fences in the outfield? Deaden the ball?

What if new buildings change the character of your static park? This happened to Wrigley Field. Before the NL built a number of new, fairly neutral stadiums in the 1960s and 1970s, Wrigley wasn’t that extreme. Then the Mets, Dodgers, Pirates, Reds, and Cardinals, and Giants all moved into stadiums that tended to favor pitchers, so Wrigley suddenly was an outlier.

A big part of the Cubs failure to win a World Series for so many years simply came from other teams lowering the bar for having an extreme stadium!

Teams do tinker with stadiums a lot more than in prior years. The best example recently was Baltimore moving fences back to prevent home runs. I suspect we’ll see more of this. It’s another way to suppress offense that MLB needs to counter.

February 24, 2024 January 14, 2024

Targeting Pitches

Tom Tango uses data on the position of the left wrist of catchers to infer how the backstops set targets for various pitches. This first post in a series deals with inside pitches, and the differences in targeting are small compared to the difference in ball placement up and down.

I could this being useful to broadcasters. It would be fairly easy to build a model for each catcher, or even each catcher/pitcher combination. Then, if an analyst says, “He wanted that pitch up!” it would more likely be true.

One could also build reports on how often pitchers hit the intended targets, which I suspect would be very useful in evaluating talent, both for the future, but in season and in game as well. Along with spin rates, FIP, and barrels, hitting the target could add to the valuation of free agents. Within a season, target deterioration could indicate an injury before it becomes serious. Within a game, target deterioration might be a better indication of when to lift a pitcher rather than third time through the order.

W. Edwards Deming would love that.

December 30, 2023 December 26, 2023

Hot and Cold

For a while I thought it was possible to make an improvement to the Beat the Streak Picks by determining the current probability of the hotness or coldness of a hitter. It strikes me as a hidden Markov model (HMM). The idea would be to assign one of three states to a player at a given time; cold, normal, hot. Since it is difficult to know the current state of the player, these hidden states make up the hidden part of the model. On each sequence (a game), the model emits an observation, in this case, the quality of the game.

For example, if you are in the normal state and have a normal game, one would postulate that the player stayed in the normal state. But a high quality game or a low quality game might signify a transition to one of the other states.

Efficient algorithms exist to learn the probabilities of transitioning from one state to another, and also the probabilities of a particular observation given the current state. To train these models, however, we need games that are tagged with a good representation of the state.

Tagging can be difficult, however. One way would be to hire people to look at a game log for a player and find the hot, not, and normal areas of the sequence. I tried that myself, and found it rather difficult to eyeball.

Instead, I used my version of an offensive game score to do the tagging. I looked at a six day sequence going forward from a particular game and took the average game score. Average game scores less than 46 were considered Cold. Average game scores greater than 56 were considered Hot. Those in between were considered normal.

I ran the tagger for six players and generated a series of averages based on the state to see if the tagger worked decently. Here are the results:

PlayerStateGamesABHitsWalksDoublesTriplesHRKBA%GWH
Luis ArraezCold823.280.660.240.040.0000.010.300.20152.4
Luis ArraezNormal4013.781.220.330.200.0220.050.320.32475.6
Luis ArraezHot533.851.920.400.450.0380.080.190.50088.7
Jackie Bradley, Jr.Cold5472.760.380.210.100.0090.030.900.13932.5
Jackie Bradley, Jr.Normal5933.490.950.380.230.0300.130.920.27165.9
Jackie Bradley, Jr.Hot423.671.640.570.330.0950.360.670.44888.1
Freddie FreemanCold1573.210.490.190.090.0000.030.850.15338.2
Freddie FreemanNormal14253.751.080.520.240.0140.150.850.28869.4
Freddie FreemanHot3033.851.640.550.410.0300.340.640.42686.1
Spike OwenCold6432.820.450.270.080.0170.010.360.15837.8
Spike OwenNormal8663.430.990.440.180.0520.040.320.28968.5
Spike OwenHot354.031.860.540.340.0860.170.260.46185.7
Ken PhelpsCold3801.970.320.340.040.0050.060.530.16327.1
Ken PhelpsNormal3442.880.780.680.110.0090.240.670.27157.8
Ken PhelpsHot373.111.430.730.300.0540.430.430.46183.8
Mike TroutCold663.090.390.290.080.0000.080.980.12734.8
Mike TroutNormal10903.671.000.620.190.0330.211.070.27267.5
Mike TroutHot3333.601.530.800.300.0480.410.690.42583.8
Averages per game by State

There are four types of hitters represented here. Freeman and Trout are superstar batters. Owen and Bradley were known more for their defense, and they are seldom hot. They are in fact cold as much as they are normal. Ken Phelps is the power hitter who walks a lot. His profile is similar to the defensive specialist, seldom hot but very often cold. Finally, Arraez is the great hitter with little power, and the lack of power keeps him from being hot often. He is seldom cold, however.

Note that for each player, the hot state is associated with a low strikeout rate, while K rates don’t vary that much between normal and cold. It would seem that in hot states players are seeing the ball well and really driving it.

The column I’m most interested in would be the last one, percentage games with a hit. Note how even the weak hitters are well over 80% when they are hot. If one is trying to choose between Arraez and Freeman on a particular day, knowing if one is hot might make a big difference.

So the idea would be to used this automated tagging to train an HMM. Use the HMM to figure out the probability of a player being hot at a particular part of time, then train a new neural network to take advantage of that parameter.

Wish me luck.

December 24, 2023

Swing High, Swing Low

Tom Tango compares the distributions of swing speeds for Luis Arraez and Ronald Acuna Jr. Arraez swings his bat much slower than Acuna, but for each they are most effective at the higher speed end of their distributions. As you might expect, Arraez makes more contact than Acuna, but Acuna’s power makes his high speed swings more valuable.

Tango:

That’s why you can’t just look at swing speed on its own: it really needs to be evaluated based on that batter’s swing distribution.

TangoTiger.com

This is also a very good example of how there are many ways in baseball to be successful. The consensus today seems to be that every hitter needs to be like Acuna. Therefore, one would think that some smart team might sign a bunch of Arraez like players cheaply and win. (See the Cardinals of the 1980s.) It could be, however, that those players are few and far between, since youngsters are taught to swing like a power hitter.

It’s not clear how we get to a better mix of Acunas and Arraezes. I think it would be a more interesting game if we did.