
Can Mike Trout hit .400? Photo: Jayne Kamin-Oncea-USA TODAY Sports
Bayes Rule allows the calculation of a difficult to measure conditional probability in terms of easier to measure probabilities. I use batting average to remember the rule:
p(H|AB) = p(AB|H)*p(H)/p(AB)
The first term is call the prior. It can be thought of as the percentage of time the probability happens. If it is a rare event, the probability of ever seeing p(H)/p(AB) is low, even if the quotient is high. In the case of batting average, the prior is 1.0, since every every hit is an at bat. So:
p(H|AB) = 1.0*p(H)/p(AB) = (H/PA)/(AB/PA) = H/AB
Which is how we think of batting average. I do like the previous step, that shows batting average as a ratio of probabilities. Tony Gwynn and Wade Boggs owned similar batting averages, Gwynn had a higher H/PA, Boggs had a lower AB/PA, so things evened out. The players with high BAs usually have a higher H/PA and a lower AB/PA compared to the league.
We can use Bayes to look at BABIP as well, batting average on balls in play. I always find balls in play to be a bit ambiguous. In this instance, we’re talking about balls in play that stay in the field of play, so we eliminate home runs. So while we usually express BABIP* as (Hits-HR)/(AB-(HR+K)), we can also express it like this:
BABIP = p(H|InP) = p(InP|H)*p(H)/p(InP)
*I’m calling these calculations batting average, but in reality they are looking at probabilities based on plate appearances, not at bats. I tend to call those hit averages. For BABIP, there is very little difference between the two. For the purposes of this article, I’m keeping the names the same, and in the end it won’t make much difference.
If we break this down, the prior, p(InP|H) is simply the fraction of non-home run hits. So BABIP is hits per ball in play, reduced by the prior. One way to generate a high BABIP is own a high probability of getting a hit with a low probability of putting the ball in play, and a high probability that the hit is not a home run.
We can also express BABnIP, batting average on balls not in play in a similar fashion (~ signifies a logical not):
BABnIP = p(H|~InP) = p(~InP|H)*p(H)/p(~InP)
The prior here represents the proportion of hits that are home runs, since home runs are the only hits not in play. We can also figure out p(K|~InP), but the prior there is trivial, since all strikeouts are ball not in play. The probability is therefore the number of strikeouts divided by the number of non-in play plate appearances (HR+BB+HBP+K).
We now have three probabilities that help us describe the hitter:
- His BABIP, which we can think of as a limiting quantity. For most players, BABIP places a ceiling on their batting average.
- KBnIP, which knocks down BABIP. Strikeouts cost batters BABIP hits. If someone has a BABIP of .300, ten strikeouts will cost him three hits, lowering his batting average.
- BABnIP, which represents the player’s ability to hit home runs. Home runs can make up for hits lost to strikeouts.
In fact, if HR/(HR+K) = BABIP, the batter has balanced his negatives and positives and will hit his BABIP! As that ratio grows so does a player’s batting average. Let’s call that the HR-K-Ratio. When the HR-K-Ratio is above a player’s BABIP, his balls not in play are increasing his batting average. (I’m going to calculate this using BABnIP as HR and KBnIP as strikeouts.)
This google spreadsheet looks at eleven different player seasons to show how various players arrive at their final batting averages. The players on the list who came closest to balancing their BABIP, HR, and strikeouts were Mark McGwire in 1998 and Dave Kingman in 1979, both low average sluggers. The two people I’d like you to pay the most attention, however, are Ted Williams in 1941 and Ty Cobb in 1911. Both hit over .400 that season.
The two approached .400 very differently. Cobb played in an era of few home runs, so to hit .400, a player needed an extremely high BABIP. Cobb came through with a .436 mark. Ty was fast, and may have been the ultimate bat control hitter. He tried to direct his batted balls to spots where they would be difficult to field, or where he could beat them out with his left-handed speed. Think of an extreme Ichiro Suzuki. Cobb only hit eight home runs, and both his strikeout and walk numbers were low. Although his Ks were low, they managed to hurt his batting average, as he hit .420 for the season.
Williams, on the other hand, brought in a BABIP of .378. Ted, however, homered more than he struck out, 37 homers to 27 K. That gave Ted an impressive .578 HR-K-Ratio. His high home run rate and low K rate put Ted over the .400 barrier. (Note that Joe DiMaggio had an even better HR-K-Ratio that season, but his BABIP was only .327.)
Which brings us to Mike Trout. Trout posted a .383 BABIP in 2012. That put him in spitting distance of .400, but his 139 strikeouts caused his HR-K-Ratio to be just .178, well below his BABIP.
So what does Mike need to do to hit .400? To just hit his BABIP, his KBnIP and BABnIP (home runs) have the following relationship:
KBnIP = (1/BABIP – 1)*BABnIP
For someone with a .383 BABIP and a BABnIP rate of .124, a KBnIP of .20 brings him in line with his BABIP. That means Trout need to cut his K’s 65% to hit around .380.
Can he bring that rate down? High home run rates and high strikeout rates go hand in hand, one reason we keep seeing the MLB K rate rise. However, as Trout matures and becomes a more dangerous hitter, I suspect he will walk more as pitchers take less chances with him. If he shows a good eye at the plate, he might even get more ball calls on 3-2 counts.
His home run power should increase as well, so an increase in home run rate allows him a high K rate.
The biggest impediment, however, maybe his high BABIP. Joey Votto currently owns the highest career BABIP in the majors, (2000 PA), about .360. That’s around where Rod Carew finished for his career, and probably a more likely year-to-year average for Trout. He will need a very high ratio of HR to Ks to reach .400 at that level.
So as the years go on, keep you eyes on Trout’s BABIP and his strikeouts. If he can keep the former high while lowering the latter, we could see an historic season.

