December 26, 2023

Hot and Cold

For a while I thought it was possible to make an improvement to the Beat the Streak Picks by determining the current probability of the hotness or coldness of a hitter. It strikes me as a hidden Markov model (HMM). The idea would be to assign one of three states to a player at a given time; cold, normal, hot. Since it is difficult to know the current state of the player, these hidden states make up the hidden part of the model. On each sequence (a game), the model emits an observation, in this case, the quality of the game.

For example, if you are in the normal state and have a normal game, one would postulate that the player stayed in the normal state. But a high quality game or a low quality game might signify a transition to one of the other states.

Efficient algorithms exist to learn the probabilities of transitioning from one state to another, and also the probabilities of a particular observation given the current state. To train these models, however, we need games that are tagged with a good representation of the state.

Tagging can be difficult, however. One way would be to hire people to look at a game log for a player and find the hot, not, and normal areas of the sequence. I tried that myself, and found it rather difficult to eyeball.

Instead, I used my version of an offensive game score to do the tagging. I looked at a six day sequence going forward from a particular game and took the average game score. Average game scores less than 46 were considered Cold. Average game scores greater than 56 were considered Hot. Those in between were considered normal.

I ran the tagger for six players and generated a series of averages based on the state to see if the tagger worked decently. Here are the results:

PlayerStateGamesABHitsWalksDoublesTriplesHRKBA%GWH
Luis ArraezCold823.280.660.240.040.0000.010.300.20152.4
Luis ArraezNormal4013.781.220.330.200.0220.050.320.32475.6
Luis ArraezHot533.851.920.400.450.0380.080.190.50088.7
Jackie Bradley, Jr.Cold5472.760.380.210.100.0090.030.900.13932.5
Jackie Bradley, Jr.Normal5933.490.950.380.230.0300.130.920.27165.9
Jackie Bradley, Jr.Hot423.671.640.570.330.0950.360.670.44888.1
Freddie FreemanCold1573.210.490.190.090.0000.030.850.15338.2
Freddie FreemanNormal14253.751.080.520.240.0140.150.850.28869.4
Freddie FreemanHot3033.851.640.550.410.0300.340.640.42686.1
Spike OwenCold6432.820.450.270.080.0170.010.360.15837.8
Spike OwenNormal8663.430.990.440.180.0520.040.320.28968.5
Spike OwenHot354.031.860.540.340.0860.170.260.46185.7
Ken PhelpsCold3801.970.320.340.040.0050.060.530.16327.1
Ken PhelpsNormal3442.880.780.680.110.0090.240.670.27157.8
Ken PhelpsHot373.111.430.730.300.0540.430.430.46183.8
Mike TroutCold663.090.390.290.080.0000.080.980.12734.8
Mike TroutNormal10903.671.000.620.190.0330.211.070.27267.5
Mike TroutHot3333.601.530.800.300.0480.410.690.42583.8
Averages per game by State

There are four types of hitters represented here. Freeman and Trout are superstar batters. Owen and Bradley were known more for their defense, and they are seldom hot. They are in fact cold as much as they are normal. Ken Phelps is the power hitter who walks a lot. His profile is similar to the defensive specialist, seldom hot but very often cold. Finally, Arraez is the great hitter with little power, and the lack of power keeps him from being hot often. He is seldom cold, however.

Note that for each player, the hot state is associated with a low strikeout rate, while K rates don’t vary that much between normal and cold. It would seem that in hot states players are seeing the ball well and really driving it.

The column I’m most interested in would be the last one, percentage games with a hit. Note how even the weak hitters are well over 80% when they are hot. If one is trying to choose between Arraez and Freeman on a particular day, knowing if one is hot might make a big difference.

So the idea would be to used this automated tagging to train an HMM. Use the HMM to figure out the probability of a player being hot at a particular part of time, then train a new neural network to take advantage of that parameter.

Wish me luck.

Leave a Reply

Your email address will not be published. Required fields are marked *