What is More Important in Historical Leagues: IRL Stats or Ratings?

chucksabr · 02-07-2023, 10:41 AM

I ask because there is some incredible divergence between the two for some players in my historical league. I have one guy from the 19th Century who has very pedestrian IRL stats (ERA, BB, K, etc.) who's regarded as a 5 star/5 star player in the game. I got another guy who put up All-Star level performances early in his IRL career and is regarded as a 0.5 star/0.5 star player.

How will the game likely treat these guys as their careers play out? If the historical league is based on IRL stats to inform their likely performance, by all rights the latter player should be much better than the former. Or does the game basically set all IRL stats aside when assigning the ratings to players and base performances off star ratings instead?

Brad K · 02-07-2023, 11:15 AM

Star ratings are the scout's opinion.

The game runs on the ratings shown in the editor when in commissioner mode.

Matt Arnold · 02-07-2023, 11:37 AM

When playing out stuff, IRL stats don't mean anything. It doesn't matter if a player hit .400 in a season, if his in-game contact is a 35, he's not going to play well in your league.

Now, how that can happen will depend a lot more on how the league is set up. If you're using a recalc, the guy who hit .400 likely won't have a 35 contact rating. But if your league uses the development engine, it's possible that basically in your universe, the player just never developed.

I would guess if the players are that different, even accounting for era, then this is a league that does not use recalc. In that event, you can basically treat it like a fictional universe, and the name "Babe Ruth" is basically only used to set their initial ratings when he first shows up.

chucksabr · 02-07-2023, 12:00 PM

Quote:

Originally Posted by Matt Arnold

When playing out stuff, IRL stats don't mean anything. It doesn't matter if a player hit .400 in a season, if his in-game contact is a 35, he's not going to play well in your league.

Now, how that can happen will depend a lot more on how the league is set up. If you're using a recalc, the guy who hit .400 likely won't have a 35 contact rating. But if your league uses the development engine, it's possible that basically in your universe, the player just never developed.

I would guess if the players are that different, even accounting for era, then this is a league that does not use recalc. In that event, you can basically treat it like a fictional universe, and the name "Babe Ruth" is basically only used to set their initial ratings when he first shows up.

The league does use a recalc (3-yr double based on, I believe, neutralized stats), so I did assume the ratings would be consistent with the stats. But there are these incredible divergences, which leads me to ask, does the game sometimes toss aside the IRL performance and basically remake the player in the image it wants for that particular sim, irrespective of his IRL performance? Or is it possible there is some logical explanation why a modern-day reliever with a 137 ERA+ across his first three seasons and a 111 ERA+ career shows up as a 0.5/0.5 star pitcher, while a 19c. pitcher with a sub-100 career ERA+, outside of one outlier season, shows up as a 5.0/5.0 star guy?

Brad K · 02-07-2023, 12:11 PM

Again, stars are not ratings that the game uses. They are the scout's opinion.

Matt Arnold · 02-07-2023, 12:24 PM

Quote:

Originally Posted by chucksabr

The league does use a recalc (3-yr double based on, I believe, neutralized stats), so I did assume the ratings would be consistent with the stats. But there are these incredible divergences, which leads me to ask, does the game sometimes toss aside the IRL performance and basically remake the player in the image it wants for that particular sim, irrespective of his IRL performance? Or is it possible there is some logical explanation why a modern-day reliever with a 137 ERA+ across his first three seasons and a 111 ERA+ career shows up as a 0.5/0.5 star pitcher, while a 19c. pitcher with a sub-100 career ERA+, outside of one outlier season, shows up as a 5.0/5.0 star guy?

ERA is a terrible metric to use to gauge player value?

The pitching model is mostly a FIP-based model. So players who beat their FIP will tend to vary a lot more. Someone like Darren O'Day who has a career 2.59 ERA but 3.48 FIP will tend to be rated worse than you might expect, since the game doesn't necessarily quantify why he can beat his FIP.

The other thing to keep in mind is that relievers generally speaking are expected to be better than starters. 111 ERA+ for a reliever is fine, but arguably is only an average reliever at best. If you combine a few other factors into that, it could easily explain why a guy comes out rated very poorly.

Neutralized stats can also have an impact. I know some people believe they over-compensate for era differences, so that can have a bigger impact. Also as mentioned, star ratings are a composite rating, so if you have scouting enabled, your scout might not like them. Or just the way their ratings are put together might make for bad ratings.

Brad K · 02-07-2023, 01:08 PM

Quote:

Originally Posted by Matt Arnold

Neutralized stats can also have an impact.

Do neutralized stats actually have a use in the game? My understanding is real stats are neutralized against the game's park factors while neutralized states are neutralized against a different calculation of park factors.

chucksabr · 02-07-2023, 01:10 PM

Quote:

Originally Posted by Matt Arnold

ERA is a terrible metric to use to gauge player value?

The pitching model is mostly a FIP-based model. So players who beat their FIP will tend to vary a lot more. Someone like Darren O'Day who has a career 2.59 ERA but 3.48 FIP will tend to be rated worse than you might expect, since the game doesn't necessarily quantify why he can beat his FIP.

Lol funny you bring this up because I do look at FIP and not ERA, I just brought up ERA because it was the go-to on the B-ref page I looked at. The 19c pitcher has a 94 FIP- career and the modern reliever has a 96. So pretty similar, yet the star difference, the scouting that is, is hugely different which I imagine leads to the individual ratings differences in things like stuff, control, movement, etc. (19c guy is also top of the list in holding runners, defending steals, etc., while 21c. guy is bottom of the list, for who knows why.

Anyhow, I would think pitchers of similar FIPs would have similar outcome probabilities, but I guess the game might mitigate those differences in ways that result in widely disparate ratings such as these.

Quote:

Originally Posted by Matt Arnold

The other thing to keep in mind is that relievers generally speaking are expected to be better than starters. 111 ERA+ for a reliever is fine, but arguably is only an average reliever at best. If you combine a few other factors into that, it could easily explain why a guy comes out rated very poorly.

Neutralized stats can also have an impact. I know some people believe they over-compensate for era differences, so that can have a bigger impact. Also as mentioned, star ratings are a composite rating, so if you have scouting enabled, your scout might not like them. Or just the way their ratings are put together might make for bad ratings.

Good insights here, thanks.

Garlon · 02-08-2023, 12:27 AM

If your player ratings are vastly different than what you expect from real stats for batters, then the is most likely your adjust and weaken import settings. As Matt mentioned, if you have development enabled your ratings will be different too. I have not seen any issues with 19th century player ratings. Are you using random debut?

Regarding neutralized stats I think we are talking about a couple different things.

1. If you use real stats, the game still makes an adjustment from the real stats. For example, if Roger Maris is given a HR rating matching 61 HR in 1961 then there could be a significant chance that he hits more than that, so they decided to put an adjustment in there.

2. If you are using neutralized stats, these are not adjusted for era. They are simply adjusted to remove the park effects for batters and pitchers.

Bobfather · 02-08-2023, 02:11 AM

Also as in real life, it is possible for players to exceed or recede from the ratings. Since each at-bat has 100s (?) of factors that go into the outcome.

Brad K · 02-08-2023, 10:14 AM

Quote:

Originally Posted by Garlon

2. If you are using neutralized stats, these are not adjusted for era. They are simply adjusted to remove the park effects for batters and pitchers.

Are the park factors used the same as the park factors in the era_ballparks file?

David Watts · 02-08-2023, 10:22 AM

Just curious, what is wrong with Maris hitting more than 61 or even reaching 61 for that matter? Is the same thing used for McGwire, Sosa, Bonds?

chucksabr · 02-08-2023, 11:05 AM

Quote:

Originally Posted by Garlon

If your player ratings are vastly different than what you expect from real stats for batters, then the is most likely your adjust and weaken import settings. As Matt mentioned, if you have development enabled your ratings will be different too. I have not seen any issues with 19th century player ratings. Are you using random debut?

Regarding neutralized stats I think we are talking about a couple different things.

1. If you use real stats, the game still makes an adjustment from the real stats. For example, if Roger Maris is given a HR rating matching 61 HR in 1961 then there could be a significant chance that he hits more than that, so they decided to put an adjustment in there.

2. If you are using neutralized stats, these are not adjusted for era. They are simply adjusted to remove the park effects for batters and pitchers.

Our league does use real stats and not neutralized, and the adjust setting is fixed and unchangeable, and there is in fact no weaken setting. So I gotta ask, what are the implications for this?

Does this mean our league values raw strikeout rate over relative strikeout rate? Meaning that 2019 Jose Berrios with his real 8.8 K/9 is a far better strikeout pitcher than 1941 Bob Feller with his real 6.8 K/9? Even though 2019 Berrios's neutralized K/9+ of 99 is very average, while 1941 Feller's neutralized K/9+ of 190 is one of the highest in history?

Does this also mean that 1930 Don Hurst with his .923 OPS is the equivalent of 1968 Carl Yastrzemski's .922 OPS, even though Hurst's OPS+ is only 115 and Yaz's is a league-leading 171?

I could come up with more examples, but I think you get the gist of my question. By pegging ratings to real stats versus neutralized, is our league coming to the kinds of conclusions I am suggesting here? Please understand, I'm not making value judgments on any of this, I just want to understand how it works.

uruguru · 02-08-2023, 02:06 PM

Quote:

Originally Posted by David Watts

Just curious, what is wrong with Maris hitting more than 61 or even reaching 61 for that matter? Is the same thing used for McGwire, Sosa, Bonds?

I suspect it's the concept that Maris abnormally peaked at 61 HRs, it was not a typical season for him. This is a common idea for pretty much any season in which a player sets a single-season record.

So if you wanted to account for it being atypical, you might set Maris' ratings so that repeated replays of 1961 might see Maris average 50 homers, not 61, but he would still occasionally peak around 61 in some of those sims.

The counterpoint is that if you set his ratings to average 61 homers, then he would literally hit over 61 homers in half of the 1961 simulations you ran. That would break verisimilitude for a lot of fans and make the simulation's accuracy seem suspect.

David Watts · 02-08-2023, 02:13 PM

Quote:

Originally Posted by uruguru

I suspect it's the concept that Maris abnormally peaked at 61 HRs, it was not a typical season for him. This is a common idea for pretty much any season in which a player sets a single-season record.

So if you wanted to account for it being atypical, you might set Maris' ratings so that repeated replays of 1961 might see Maris average 50 homers, not 61, but he would still occasionally peak around 61 in some of those sims.

The counterpoint is that if you set his ratings to average 61 homers, then he would literally hit over 61 homers in half of the 1961 simulations you ran. That would break verisimilitude for a lot of fans and make the simulation's accuracy seem suspect.

That begs the question, do you set Norm Cash to never hit higher than say .289? Josh Hamilton no higher than .298?

uruguru · 02-08-2023, 02:28 PM

Quote:

Originally Posted by David Watts

That begs the question, do you set Norm Cash to never hit higher than say .289? Josh Hamilton no higher than .298?

It depends. Are you trying to recreate 1961, or are you trying to recreate Roger Maris?

If the former, then Maris should average 61 home runs but the variability of the sim needs to be lessened significantly to avoid the 70-homer seasons that would create in the current engine.

If the latter, then you would model 1961 in the context of Maris's career. You might create a regression line for his power over his career and then see that 1961 was way above that regression line and downplay his power in that season, trusting that the inherent variability of the sim would still occasionally produce 61-homer seasons. But in the grand scheme of his career, he should still average out to the same overall career over many simulations.

David Watts · 02-08-2023, 02:39 PM

Quote:

Originally Posted by uruguru

It depends. Are you trying to recreate 1961, or are you trying to recreate Roger Maris?

If the former, then Maris should average 61 home runs but the variability of the sim needs to be lessened significantly to avoid the 70-homer seasons that would create in the current engine.

If the latter, then you would model 1961 in the context of Maris's career. You might create a regression line for his power over his career and then see that 1961 was way above that regression line and downplay his power in that season. But in the grand scheme of his career, he should still average out to the same overall career over many simulations.

Exactly. That being said, OOTP has 1 year recalc, 3 year recalc and 5 year recalc. For the life of me, I don't know why Maris would be rated in a way to prevent him reaching or even exceeding 61 when using 1 year recalc. Nor, do I see a reason for Mickey Mantle to be rated to hit more home runs in 61 than Maris when using 1 year recalc. 3 year and 5 year recalc should do exactly what you describe by allowing Maris to come closer to his average hr total without maybe reaching the 61 outlier season.

Brad K · 02-08-2023, 02:41 PM

The point is OOTP devs are married to randomness and regularly flash the purity ring while selectively and exclusively committing adultery when it comes to home run totals. We are given examples of how a cap could be gamed without an acknowledgement reduced HR ratings can be gamed.

uruguru · 02-08-2023, 02:50 PM

Quote:

Originally Posted by David Watts

Exactly. That being said, OOTP has 1 year recalc, 3 year recalc and 5 year recalc. For the life of me, I don't know why Maris would be rated in a way to prevent him reaching or even exceeding 61 when using 1 year recalc. Nor, do I see a reason for Mickey Mantle to be rated to hit more home runs in 61 than Maris when using 1 year recalc. 3 year and 5 year recalc should do exactly what you describe by allowing Maris to come closer to his average hr total without maybe reaching the 61 outlier season.

As you can imagine, there are always two big problems with baseball simulations.

1) the data model is complex - at a very basic level, there is pitcher vs batter, defense, park and era effects. These things all have to be accounted for a wide variety of events. At a deeper level, there is a muddiness in what modeled baseball events actually represent... are home runs modeled independently, or are they grouped with fly outs and extra base hits. If you do not model accurately, you have a real problem with 2) below.

2) baseball fans are generally very knowledgeable about stats and have a low tolerance for perceived inaccuracies in a simulation. How do you create randomness without excessive variability that would turn away fans? Roger Maris hitting 53, 57, or 62 home runs in 1961 is ok. But 65? That is absurd!

uruguru · 02-08-2023, 02:58 PM

Quote:

Originally Posted by Brad K

The point is OOTP devs are married to randomness and regularly flash the purity ring while selectively and exclusively committing adultery when it comes to home run totals. We are given examples of how a cap could be gamed without an acknowledgement reduced HR ratings can be gamed.

I've developed a (non-baseball) strategy game and I can tell you first-hand writing a competent AI is extremely difficult and that RNG elements are the easiest way to conceal AI shortcomings.

02-07-2023, 10:41 AM	#1
chucksabr Hall Of Famer Join Date: Sep 2013 Location: In the canyons of your mind Posts: 3,194	What is More Important in Historical Leagues: IRL Stats or Ratings? I ask because there is some incredible divergence between the two for some players in my historical league. I have one guy from the 19th Century who has very pedestrian IRL stats (ERA, BB, K, etc.) who's regarded as a 5 star/5 star player in the game. I got another guy who put up All-Star level performances early in his IRL career and is regarded as a 0.5 star/0.5 star player. How will the game likely treat these guys as their careers play out? If the historical league is based on IRL stats to inform their likely performance, by all rights the latter player should be much better than the former. Or does the game basically set all IRL stats aside when assigning the ratings to players and base performances off star ratings instead? __________________ The (English) Baseball League (est. 1888): A History The 1932 Season is Underway Dynasty Mission \| Creation Story \| Subscribe 1888 1889 1890 1891 1892 1893 1894 1895 1896 1897 1898 1899 1900 1901 1902 1903 1904 1905 1906 1907 1908 1909 1910 1911 1912 1913 1914 (Great War Hiatus) 1919 1920 1921 1922 1923 1924 1925 1926 1927 1928 1929 1930 1931 The OOTP Interview about the Baseball League

02-08-2023, 12:27 AM	#9
Garlon Hall Of Famer Join Date: Jun 2004 Posts: 4,368	If your player ratings are vastly different than what you expect from real stats for batters, then the is most likely your adjust and weaken import settings. As Matt mentioned, if you have development enabled your ratings will be different too. I have not seen any issues with 19th century player ratings. Are you using random debut? Regarding neutralized stats I think we are talking about a couple different things. 1. If you use real stats, the game still makes an adjustment from the real stats. For example, if Roger Maris is given a HR rating matching 61 HR in 1961 then there could be a significant chance that he hits more than that, so they decided to put an adjustment in there. 2. If you are using neutralized stats, these are not adjusted for era. They are simply adjusted to remove the park effects for batters and pitchers. Last edited by Garlon; 02-08-2023 at 12:33 AM.

02-08-2023, 02:11 AM	#10
Bobfather All Star Starter Join Date: Jun 2016 Location: Boston Ma. Posts: 1,828	Also as in real life, it is possible for players to exceed or recede from the ratings. Since each at-bat has 100s (?) of factors that go into the outcome. __________________ I play out every game—one pitch mode.

02-07-2023, 11:15 AM	#2
Brad K Banned Join Date: May 2016 Location: St Petersburg Florida USA Posts: 6,693 Infractions: 0/2 (4)	Star ratings are the scout's opinion. The game runs on the ratings shown in the editor when in commissioner mode.

02-07-2023, 11:37 AM	#3
Matt Arnold OOTP Developer Join Date: Jun 2009 Location: Here and there Posts: 16,218	When playing out stuff, IRL stats don't mean anything. It doesn't matter if a player hit .400 in a season, if his in-game contact is a 35, he's not going to play well in your league. Now, how that can happen will depend a lot more on how the league is set up. If you're using a recalc, the guy who hit .400 likely won't have a 35 contact rating. But if your league uses the development engine, it's possible that basically in your universe, the player just never developed. I would guess if the players are that different, even accounting for era, then this is a league that does not use recalc. In that event, you can basically treat it like a fictional universe, and the name "Babe Ruth" is basically only used to set their initial ratings when he first shows up.

02-07-2023, 12:11 PM	#5
Brad K Banned Join Date: May 2016 Location: St Petersburg Florida USA Posts: 6,693 Infractions: 0/2 (4)	Again, stars are not ratings that the game uses. They are the scout's opinion.

02-08-2023, 10:22 AM	#12
David Watts Hall Of Famer Join Date: Apr 2002 Location: Looking for a place called Leehofooks Posts: 10,128 Infractions: 0/1 (1)	Just curious, what is wrong with Maris hitting more than 61 or even reaching 61 for that matter? Is the same thing used for McGwire, Sosa, Bonds?

02-08-2023, 02:41 PM	#18
Brad K Banned Join Date: May 2016 Location: St Petersburg Florida USA Posts: 6,693 Infractions: 0/2 (4)	The point is OOTP devs are married to randomness and regularly flash the purity ring while selectively and exclusively committing adultery when it comes to home run totals. We are given examples of how a cap could be gamed without an acknowledgement reduced HR ratings can be gamed.