The Normalization Debate
A little while ago, somebody bumped up this thread on APBA: Between The Lines.
Most of the posts have been deleted, and, sadly, Keith Russell, who started the discussion, passed away in early 2021. Keith’s part in the debate has been entirely removed, which makes the discussion kind of hard to follow.
Having said that, normalization is actually a fascinating topic. This is where the theory of baseball simulation starts to run into places that it’s not “supposed” to go.
Long story short — normalization is when you start mixing eras. It could be a single player, or a full team, or even several teams. Normalization comes into play once you take a team or player or some simulation thing designed to be played in one era and stick it into another era.
And it happens even if you don’t make any era specific adjustments.
The whole debate gets pretty confusing in a hurry, and it causes a lot of long time baseball fans to start wondering why we’re messing around with statistics that most of us keep sacred. Let me see if I can help the debate make some sense.
Can I play teams from one year against teams from another year?
Of course you can. You can do whatever you want.
However, as soon as you cross that Rubicon, you’re going to have to deal with everything that comes with it.
You can certainly take teams from, say, the old 1927 Strat-O-Matic baseball set and play them in a league with teams from the 1974 Strat-O-Matic baseball set. However, what you need to realize is that Strat-O-Matic teams are all calibrated for the season that they played in. In other words, it’s very likely that the statistics you’ll get in your project might be skewed. Perhaps the starting pitchers from the 1927 teams will have an advantage because of how they were carded. Perhaps the 1927 Yankees hitters will outhit every other 1974 team in terms of power. Perhaps the 1974 teams will perform better because of a superior bullpen. It’s hard to say.
Regardless of the game you use, when you mix up teams from different eras like this, you are already making normalization adjustments. These adjustments depend entirely on how the game you chose creates its cards (or, for computer games, how it interprets the data).
Different games will do different things. If you use the APBA basic game, the 1927 Babe Ruth will likely hit about 60 home runs regardless of the era you stick him in because of how APBA is constructed. If you use Strat-O-Matic, he’ll get part of the way there, but how likely he is to hit 60 home runs will depend on the pitcher cards he faces. This is because the pitcher cards will likely determine the result about 50% of the time.
The normalization discussion starts when you realize that the cards are normalized already, and when you understand that you can make some additional adjustments to make the statistics feel right.
What happens if we just stick the best players together?
This is honestly impossible to answer for sure.
Back in The 1985 Baseball Abstract, Bill James wrote a fascinating article about major league equivalency ratings. This is back when most major league baseball clubs didn’t pay attention to minor league statistics, back when there was a built-in bias towards “experienced” players regardless of how bad they were.
James reminded us of a fact that we often forget. We shouldn’t think of major league baseball talent in terms of a “normal curve.” Instead, we should think of it as the extreme right end of a normal curve:
Now, baseball is a sport of balance. It’s not a sport like track and field, where we can measure how fast somebody ran the mile or how high they jumped or whatever. In baseball, statistics only have meaning in the overall context of the league.
For example, when Pedro Martinez had a 2.07 ERA in 1999, this fantastic stat was amazing because of how good the offense of the 1999 American League was. Similarly, when Babe Ruth hit 54 home runs in 1920, that feat was remarkable largely because of how good pitching in the 1920 American League was.
If we were to stick the greatest teams of all time in a league and have them play against each other, what we are essentially doing is moving the line in the middle of the graph above over to the right. The same thing happens when we use a “greatest players of all time” card set or disk.
Because we’re making this adjustment, we cannot assume that players will perform at the exact level they performed at in real life. Babe Ruth has to face Pedro Martinez, for example — and Pedro has to face the Babe. I don’t know if one will win or the other will win, but I strongly suspect that Martinez isn’t going to perform as well as he did in 1999, or that Ruth will peform as well as he did in 1920. The average level of talent has changed.
So what kind of adjustments do we make?
Those who understand normalization tend to adjust players to the same baseball era. Most people tend to choose a statistical era that mirrors the late 1970s, back when there were complete games and saves, base stealers and slow players, power hitters and slap hitters, and so on.
The idea is to adjust players to get them to perform similarly on average to how players performed in whatever era you chose.
That means that we’ll have to bring the outliers down a little bit. Maybe somebody like Babe Ruth winds up hitting 45 or 50 home runs instead of 60. Maybe the 1968 Bob Gibson winds up with an ERA closer to 2.00 than 1.00. Maybe Ty Cobb leads the league with 80 steals.
The idea is to make the most talented players look like the most talented players in the target era, and then adjust everybody else accordingly.
But isn’t that changing history?
Yep — it absolutely is.
But you still have to do it.
The problem is that Babe Ruth is either going to hit Pedro Martinez or he isn’t. If Ruth hits a home run and makes Martinez look awful, Pedro’s pitching statistics will suffer, and the replayer will complain that the game isn’t realistic. And, if Ruth strikes out all the time against Pedro, Ruth’s batting stats will suffer, and the replayer will complain again.
Instead of thinking of statistics as some sort of sacred cow, think about baseball statistics as a relative measure of player talent. They only mean something when you take them in the context of the other players those players were playing against. If you’re going to take that raw talent and move it to another era, you’d naturally expect that the resulting statistics would be different.
Would Barry Bonds hit 40 home runs in 1908?
Almost certainly not.
Tim Jordan managed 12 home runs for the Brooklyn Superbas in 1908. He led the major leagues.
This is where you need to be careful with algorithmic approaches to normalization. While Bonds theoretically could go back to 1908 and turn into the greatest hitter of all time, chances are that it’s not going to happen. In fact, chances are that Bonds will hit something like Tim Jordan, though with a few more doubles (Jordan had only 18), and probably more triples (Jordan had only 5).
Hitters struggled in 1908, but not because they were awful players. They struggled because they played day games that tended to start after 3:30 PM. They struggled because they had to deal with shadows on the field, white shirts in the bleachers right behind the pitcher, and all sorts of trick pitches. They struggled because they frequently couldn’t even see the ball, especially in late innings.
And they struggled because of strategy. “Scientific baseball” in those days meant trying to steal bases even when it made no sense. The emphasis was on the bunt, and sacrificing outs carried with it some sort of moral weight, as if bunting were some kind of badge of honor.
If you assume that Bonds had the same kind of training and experience and coaching that other players in 1908 had, he probably would perform similar to them. He wouldn’t hit 40 home runs.
But aren’t today’s players objectively the greatest of all time?
You could argue that today’s players are better than any players in the history of the game, sure.
But is it an objective argument? Nope.
I’ve been involved in all sorts of arguments about this subject over the past few decades. They tend to go around in circles. You’ve got one camp that insists that modern athletes are simply better trained and in better shape, and that any kind of cross-era comparison is futile. You’ve got another camp that will point out that players in the past had to take bumpy train rides and play in the afternoon with all sorts of dust and gunk in the air, and that today’s pampered modern athlete wouldn’t stand a chance against all those shine balls.
Actually, the evidence indicates that the highest echelon of Major League Baseball talent has probably been more or less steady over time.
We know this because star players don’t suddenly collapse in ways that don’t match normal aging patterns (well, you have to ignore Albert Pujols, I suppose). If you assume that the players of 1971 were miles above the players from 1951, thanks to integration, better health, nutrition, and so on, you’re going to have to explain how Willie Mays could still compete on the highest level in 1971. If you think the game in 1998 was miles above where it was in 1987, you’ll have to explain how in the world Mark McGwire could continue to dominate the pitching he faced.
Even the steroid argument doesn’t work. The problem, of course, is that both pitchers and hitters used steroids. And then there’s the little fact that steroids aren’t magic home run pills. In fact, most of the players we know for sure used steroids were marginal players — usually players looking for a workout edge to help them after a career threatning injury.
Anyway, the truth is that the best way to envision having the best players and teams of all time meet together is to assume that the overal level of play has been more or less consistent over time. Personally, I think there has likely been a slight improvement, largely due to the gradual end of racially based exclusion (something that continues to be a barrier to great Japanese players to this day).
What about roster sizes?
This is where the idea of a greatest teams league runs into problems.
The 1904 New York Giants won 106 games and lost only 47. They absolutely belong in the discussion of greatest teams of all time — though most people look past them and straight at that 1905 Giants team, which was just as good.
Baseball Reference will tell you that the 1904 Giants used 9 pitchers all season. That seems low by today’s standards.
However, Frank Bowerman and Jack Dunn were not actually pitchers. Bowerman pitched an inning in a meaningless late September game against Pittsburgh; the Giants lost, 7-0. Dunn pitched four innings in a dominating win in Philadelphia in late May.
It gets worse. Claude Elliott pitched 3 games all season, logging a grand total of 14 innings pitched for the Giants. At least Elliott was actually a pitcher; he went from Cincinnati to new York in late August.
Meanwhile, Billy Milligan pitched in 25 innings at the beginning of the season, and seems to have been released. His reported 5.40 ERA tells us that he didn’t pitch all that well, though I’m going to have to do more research to figure things out.
That leaves us with 5 pitchers.
But wait. Red Ames pitched in only 16 games that season. In fact, the 24 year old youngster wasn’t even with the team until July 4.
Hooks Wiltse, a rookie, only appeared in 24 games all season.
That leaves you with only 3 pitchers.
How do you account for this? How can a team that basically only had 3 pitchers play against the 2022 Dodgers?
I honestly don’t know the answer. I do know, however, that most computer games won’t give you “realistic” results. You’d need to play with a board game, or you’d need to somehow set the computer manager to keep the starting pitcher in no matter what.
Remember that McGinnity went 35-8 for the Giants that year, and Mathewson went 33-12. The fact that they had two pitchers win over 30 games should tell you something about how that bullpen was organized.
Now, does this mean the 1904 Giants can’t compete, or that they shouldn’t be part of the story, or that baseball in 1904 was hopelessly primitive, or that we shouldn’t even try? Of course not.
It does mean, however, that we need to be careful when we mess around with projects like this. Not even normalization will save you from an extremely small roster.
Really interesting article. Baseball cross-era is something that I enjoy but at the same time never can warp my head around.
Great article/writing...one of your best to date. thank you