Home | Webstore
Latest News: OOTP 25 Available - FHM 10 Available - OOTP Go! Available

Out of the Park Baseball 25 Buy Now!

  

Go Back   OOTP Developments Forums > Out of the Park Baseball 21 > Perfect Team 21
Register Blogs FAQ Calendar Today's Posts Search

Perfect Team 21 Perfect Team 21 - The online revolution! Battle tens of thousands of PT managers from all over the world and become a legend.

Reply
 
Thread Tools
Old 07-02-2020, 09:34 PM   #1
Hoover36
Minors (Double A)
 
Join Date: Mar 2003
Location: NV
Posts: 187
Does the OOTP ratings team ever validate the ratings?

Does the OOTP ratings team ever validate the ratings? What I mean is that if you calculate the ratings for each player in a specific season...sim that seasons 100x (or 1000x, ect. to get an appropriate sample size), the ratings for each player should average out pretty close to their performance for that specific season. Do this for every season and you would have a realistic rating for each player in baseball history. Then you assign an appropriate ratings to players making it possible to pit a player from the deadball era to a player in the modern era without having to add these "league totals" and get a realistic representation of what would happen if Babe Ruth faced Clayton Kershaw?

Has this ever happened or even been discussed?
Hoover36 is offline   Reply With Quote
Old 07-03-2020, 02:35 AM   #2
Lemandria
All Star Reserve
 
Join Date: Sep 2019
Location: Chicagoland
Posts: 702
They don't have to "sim" anything, they have server-level access to every game played in any league anywhere. Tens of thousands of games is a pretty good sample size.

Number-cruncher's dream.

Aaannnnnddd they've all signed NDA's, so if the sims were wildly inaccurate or perfect to the seventh decimal place, they aren't gonna discuss it. But is that a sensible question anyway? Statisticians spend a lot of time wrangling over comparisons between players from entirely different eras, whose word are you going to accept for what 'accurate' means in a case like that?

When you can create cards that easily outplay any year their 'real' exemplars ever had, that goes without saying. They do create better-than-historical-best cards, every single year.

So yes, within the implicit compromises they've accepted in the name of monetary feasibility, they are 100.0000% accurate.
__________________
FOTF victim
Farewell

Last edited by Lemandria; 07-03-2020 at 03:11 AM.
Lemandria is offline   Reply With Quote
Old 07-03-2020, 12:25 PM   #3
Hoover36
Minors (Double A)
 
Join Date: Mar 2003
Location: NV
Posts: 187
The data collected from perfect team sim's is not relevant if the ratings aren't correct going in.

What I am saying is that if you created a specific year solo game with the ratings generated for all players (which I assume is generated by their algorithm). Using historical lineup and transactions sim that season 100x times. When you add up all the stats per player from those 100 sim and found the average season totals for each player, they should come within a close approximation of that players performance in that specific season. Do that for all players, all season, you could calculate more accurate ratings for players.

What it feels like is happening right now is a close enough rating is applied to players. Something that "feels about right". However if you used the ratings for those players in a solo season for that specific year, you get nothing close to actual performance.
Hoover36 is offline   Reply With Quote
Old 07-03-2020, 08:12 PM   #4
mcdog512
Hall Of Famer
 
mcdog512's Avatar
 
Join Date: Dec 2018
Location: Pack Robert Gibson; November 9, 1935 – October 2, 2020
Posts: 2,339
Quote:
Originally Posted by Hoover36 View Post
The data collected from perfect team sim's is not relevant if the ratings aren't correct going in.

What I am saying is that if you created a specific year solo game with the ratings generated for all players (which I assume is generated by their algorithm). Using historical lineup and transactions sim that season 100x times. When you add up all the stats per player from those 100 sim and found the average season totals for each player, they should come within a close approximation of that players performance in that specific season. Do that for all players, all season, you could calculate more accurate ratings for players.

What it feels like is happening right now is a close enough rating is applied to players. Something that "feels about right". However if you used the ratings for those players in a solo season for that specific year, you get nothing close to actual performance.
I hear ya, although this is an online card pack mode, not OOTP base game. Accurate ratings are important for sure for immersion but strictly not super important in the overall game.
__________________




mcdog512 is offline   Reply With Quote
Old 07-04-2020, 12:22 AM   #5
Hoover36
Minors (Double A)
 
Join Date: Mar 2003
Location: NV
Posts: 187
Quote:
Originally Posted by mcdog512 View Post
I hear ya, although this is an online card pack mode, not OOTP base game. Accurate ratings are important for sure for immersion but strictly not super important in the overall game.
I think that depends on who you ask. I would venture to guess that MOST would argue accurate ratings are super important.

I see a thread every day in regards to folks asking that the devs look into ratings of some sort.
Hoover36 is offline   Reply With Quote
Old 07-04-2020, 09:10 AM   #6
Syd Thrift
Hall Of Famer
 
Syd Thrift's Avatar
 
Join Date: May 2004
Posts: 10,040
Quote:
Originally Posted by Hoover36 View Post
The data collected from perfect team sim's is not relevant if the ratings aren't correct going in.

What I am saying is that if you created a specific year solo game with the ratings generated for all players (which I assume is generated by their algorithm). Using historical lineup and transactions sim that season 100x times. When you add up all the stats per player from those 100 sim and found the average season totals for each player, they should come within a close approximation of that players performance in that specific season. Do that for all players, all season, you could calculate more accurate ratings for players.

What it feels like is happening right now is a close enough rating is applied to players. Something that "feels about right". However if you used the ratings for those players in a solo season for that specific year, you get nothing close to actual performance.
They use an algorithm to create ratings out of the stats. They’ve been using the same basic algorithm for nearly 20 years and it comes pretty close for pretty much any season at this point. They don’t spend that kind of time beta testing individual historical seasons because that kind of curated season content is not what they sell (that’s more Strat-o-Matic’s thing). Which, besides, if the model works, all running simulations against the model does is prove that the model works. And if it doesn’t, you don’t change individual ratings, you figure out what caused the model to churn out bad stats and fix it. I am *sure* this kind of generalized testing has been done.

PT is not of course running against some kind of historical baseline. Your 1930 Hack Wilson doesn’t get to play against 1930 Claude “Weeping” Willoughby or Leo Strickland (who still holds iirc the record for most IP with more runs allowed than IP). Playing nothing but stars vs stars is going to do screwy things with the numbers, even if the ratings were originally “right” for the era/season/etc.
__________________
Quote:
Originally Posted by Markus Heinsohn
You bastard....
The Great American Baseball Thrift Book - Like reading the Sporting News from back in the day, only with fake players. REAL LIFE DRAMA THOUGH maybe not
Syd Thrift is offline   Reply With Quote
Old 07-04-2020, 09:26 AM   #7
Lemandria
All Star Reserve
 
Join Date: Sep 2019
Location: Chicagoland
Posts: 702
The sort of emperical testing he describes is unlikely to much resemble whatever ootp uses to benchmark their accuracy.

But it does sound like the sort of thing an end-user would attempt; perhaps this query should be redirected to the ootp base game forums? It's about the base engine accuracy vs historical stats, right?

And the dev team is unlikely to be able to discuss much about their internal testing.

They're satisfied, certainly.
__________________
FOTF victim
Farewell

Last edited by Lemandria; 07-04-2020 at 09:27 AM.
Lemandria is offline   Reply With Quote
Old 07-04-2020, 09:26 AM   #8
mcdog512
Hall Of Famer
 
mcdog512's Avatar
 
Join Date: Dec 2018
Location: Pack Robert Gibson; November 9, 1935 – October 2, 2020
Posts: 2,339
Quote:
Originally Posted by Hoover36 View Post
I think that depends on who you ask. I would venture to guess that MOST would argue accurate ratings are super important.

I see a thread every day in regards to folks asking that the devs look into ratings of some sort.
I wouldn't say they are unimportant. Who would want to play game where Mario Mendoza is far better than Ted Williams? That said, as long as they are within the ballpark so to speak I can factor them into my playing and purchasing decisions.
__________________




mcdog512 is offline   Reply With Quote
Old 07-04-2020, 10:27 AM   #9
Hoover36
Minors (Double A)
 
Join Date: Mar 2003
Location: NV
Posts: 187
Quote:
Originally Posted by Syd Thrift View Post
They use an algorithm to create ratings out of the stats. They’ve been using the same basic algorithm for nearly 20 years and it comes pretty close for pretty much any season at this point.
I would disagree, it does not come pretty close. It is "ballpark" at best. It is the sole reason that I stopped buying OOTP regularly. Perfect team (for a while) has brought me back and while I find enjoyment in PT I will continue to purchase OOTP. However, if the ratings accuracy doesn't improve, I will slowly lose interest once again and return to 6.5 solely.

Quote:
Originally Posted by Syd Thrift View Post
Playing nothing but stars vs stars is going to do screwy things with the numbers, even if the ratings were originally “right” for the era/season/etc.
Agreed. But even when playing stars vs stars, something is "off" when Ichiro still "feels" like Ichiro, while Babe feels like Adam Dunn.
Hoover36 is offline   Reply With Quote
Old 07-04-2020, 12:30 PM   #10
RonCo
Hall Of Famer
 
Join Date: Aug 2003
Posts: 9,499
Quote:
Originally Posted by Hoover36 View Post
But even when playing stars vs stars, something is "off" when Ichiro still "feels" like Ichiro, while Babe feels like Adam Dunn.
That's an interesting comparison. By raw stats alone, it seems farcical that these two could be similar players. But when you think about it and realize that the Babe led the league in strike outs many years, you begin to scratch your head. The Babe, relative to his peers had a very high strike out rate--but his peers were not so high (Ks in the 1920s and 1930s were quite low, relative). Put the Babe into a modern setting where he still leads the league in Ks, and his batting average and doubles will plummet because less balls get put into play and his HR will drop for similar reasons. If his ks double and his batting average plummets ... well ...

Likewise, put Adam Dunn and his well above average power into an era where Ks don't happen much and his fall off the table, his average, doubles and HR rise.

I admit I'm now speaking out of my general experience, but one of the "issues" with the PT environment as I understand it is that it uses a fairly modern era as it's baseline (adjusting everyone to that modern era). That era happens to be very much Ichiro's, not so much the Babe's. I wouldn't have thought about it until you said it, but when I think about it closely it doesn't strain credibility to say the Babe would be like Adam Dunn in today's high-K world. That's kind of interesting, really.

That doesn't make it more fun to own Babe Ruth in these games, but it's an interesting conversation piece.

Last edited by RonCo; 07-04-2020 at 02:21 PM.
RonCo is offline   Reply With Quote
Old 07-04-2020, 02:36 PM   #11
Hoover36
Minors (Double A)
 
Join Date: Mar 2003
Location: NV
Posts: 187
Ichiro was an anomaly during his playing time...Same way Ruth was. To allow one to thrive and the other to be relegated reserve rosters still doesn't "feel" right. I agree it is an interesting conversation. However, ignore Ruth for now and swap in Trout who also plays in this era. Trout, similar to Ruth, can't hold his own...yet he is the best player in baseball in this era. That sounds like a ratings issue in general or an engine issue with treatment of Avoid K's.
Hoover36 is offline   Reply With Quote
Old 07-04-2020, 02:46 PM   #12
RonCo
Hall Of Famer
 
Join Date: Aug 2003
Posts: 9,499
Quote:
Originally Posted by Hoover36 View Post
Ichiro was an anomaly during his playing time...Same way Ruth was. To allow one to thrive and the other to be relegated reserve rosters still doesn't "feel" right. I agree it is an interesting conversation. However, ignore Ruth for now and swap in Trout who also plays in this era. Trout, similar to Ruth, can't hold his own...yet he is the best player in baseball in this era. That sounds like a ratings issue in general or an engine issue with treatment of Avoid K's.
You're probably right. Bottom line is that there will be stats warping due to era differences, and stats warping due to ratings within the population varying from "norm." There will also be natural random variation--which can be quite severe to our eyes. Add that to the engine warp, and one can get some occasionally weird looking results, too.

Like I said, though, I'm out of my experience bubble with deep PT stuff.
RonCo is offline   Reply With Quote
Old 07-04-2020, 03:03 PM   #13
ubernoob
Minors (Triple A)
 
Join Date: Mar 2007
Posts: 288
Quote:
Originally Posted by Hoover36 View Post
Ichiro was an anomaly during his playing time...Same way Ruth was. To allow one to thrive and the other to be relegated reserve rosters still doesn't "feel" right. I agree it is an interesting conversation. However, ignore Ruth for now and swap in Trout who also plays in this era. Trout, similar to Ruth, can't hold his own...yet he is the best player in baseball in this era. That sounds like a ratings issue in general or an engine issue with treatment of Avoid K's.

It's an issue of league normalization.


High BABIP/Low Power players translate to PT better due to this.
__________________
ubernoob is offline   Reply With Quote
Old 07-04-2020, 03:06 PM   #14
Hoover36
Minors (Double A)
 
Join Date: Mar 2003
Location: NV
Posts: 187
I actually like the stat warping due to the era difference. When I used to run leagues, this set up of pitting players from all era's against one another was my longest running league going some 70+ seasons. I am familiar with a Ruth who from 1920-1932 faced pitchers with an average of 30 Stuff, 70 Control and 87 Movement which would result in his .356 average 46 HR and 75 K's. Pitting him up against (I'll assume all SE's) 100 Stuff, 85 Control, 81 Movement, his K's would go up, average would go down...but not from .356 to .200. His power shouldn't be all that affected, he'd be getting less hits, but he is hitting against pitchers with lower movement than he faced.

Anyway, I would really love to see a validation of the ratings for the players against the players they actually faced in those specific years. It would make the stat output from facing the star vs star set up much more enjoyable and realistic.
Hoover36 is offline   Reply With Quote
Old 07-04-2020, 03:13 PM   #15
Hoover36
Minors (Double A)
 
Join Date: Mar 2003
Location: NV
Posts: 187
Quote:
Originally Posted by ubernoob View Post
It's an issue of league normalization.


High BABIP/Low Power players translate to PT better due to this.
It's somewhat of a cop out to just say that without any data or anecdotal evidence to support the statement.

The issue with saying its just league normalization doesn't explain why Mike Trout gets hammered similarly to Ruth. Either Trout's ratings are wrong or the engine's handling of Avoid K's has an issue.

There are hundreds of different ways to build a team, whether real life or simulated. Forcing everyone to only use High Avoid K,High Contact players shows a flaw in the system.

Quite frankly this game is far and away too exceptional in every other regard to simply ignore this one flaw.
Hoover36 is offline   Reply With Quote
Old 07-04-2020, 03:19 PM   #16
RonCo
Hall Of Famer
 
Join Date: Aug 2003
Posts: 9,499
Quote:
Originally Posted by Hoover36 View Post
I actually like the stat warping due to the era difference. When I used to run leagues, this set up of pitting players from all era's against one another was my longest running league going some 70+ seasons. I am familiar with a Ruth who from 1920-1932 faced pitchers with an average of 30 Stuff, 70 Control and 87 Movement which would result in his .356 average 46 HR and 75 K's. Pitting him up against (I'll assume all SE's) 100 Stuff, 85 Control, 81 Movement, his K's would go up, average would go down...but not from .356 to .200. His power shouldn't be all that affected, he'd be getting less hits, but he is hitting against pitchers with lower movement than he faced.
You might be surprised by how far Ruth's average would fall. If you take his numbers, and double his K-rate (which isn't unreasonable given the K-rate difference in eras), keep his HR, BABIP, and walk rates essentially the same, his career batting average falls to something like .237. I did that exercise some time back, so I could be a little off on that .237, but it's close. The number .244 comes to mind, too. Whatever, the difference was so startling I went back and triple checked my numbers at the time.

The impact of the Stuff/AvK match-up is a bigger driver than the MOV/Power match up because (unless the warp is HHHHUUUUUGGGGEEE) there are a lot fewer HR than Ks in an average game. In other words, a 20% warp in HR affects one event every two or three games, whereas a 20% warp in K (and BABIP for that matter) impact 2-3 plays per game.

Last edited by RonCo; 07-04-2020 at 03:20 PM.
RonCo is offline   Reply With Quote
Old 07-04-2020, 03:37 PM   #17
CBeisbol
Banned
 
Join Date: Aug 2019
Location: Ban land in 3...2...
Posts: 2,943
Ruth, in 1927, had a K%+ of 214. He K'd over twice as often as the league average hitter.

Move him to 2011, I think that's the year that people have said the PT stats are based on. No player in 2011 had a K%+ of higher than 200. The highest was Mark Reynolds at 176. That was a 31% K rate. If Ruth K'd at greater than twice the league rate in 2011, that would be something like a 40% strike out rate

Hard to hit for a high average when you're doing that
CBeisbol is offline   Reply With Quote
Old 07-04-2020, 03:50 PM   #18
RonCo
Hall Of Famer
 
Join Date: Aug 2003
Posts: 9,499
Quote:
Originally Posted by ubernoob View Post
It's an issue of league normalization.


High BABIP/Low Power players translate to PT better due to this.
Just doing mental gymnastics (and noting again, that I have ZERO real familiarity to PT), I'd guess Power is a small factor, but that AvK and BABIP have higher impact in any translation from era to era. Mike Trout is a fairly high K-rate guy over his career, and translating him back even ten years will warp him in ways that could be big enough to affect his resulting batting average in noticeable ways.

This isn't a design "flaw," as Hoover36 is calling it, so much as it is an indication of the real world issue that moving players from era to era will naturally result in a squeezing of the results in some ways that might feel unnatural. At the end of the day, there are only so many plate appearances. The designer has to find the least offensive way to put all that toothpaste back into the tube, and there won't be anything that's "right."

Last edited by RonCo; 07-04-2020 at 03:52 PM.
RonCo is offline   Reply With Quote
Old 07-04-2020, 04:17 PM   #19
ubernoob
Minors (Triple A)
 
Join Date: Mar 2007
Posts: 288
Quote:
Originally Posted by RonCo View Post
Just doing mental gymnastics (and noting again, that I have ZERO real familiarity to PT), I'd guess Power is a small factor, but that AvK and BABIP have higher impact in any translation from era to era. Mike Trout is a fairly high K-rate guy over his career, and translating him back even ten years will warp him in ways that could be big enough to affect his resulting batting average in noticeable ways.

This isn't a design "flaw," as Hoover36 is calling it, so much as it is an indication of the real world issue that moving players from era to era will naturally result in a squeezing of the results in some ways that might feel unnatural. At the end of the day, there are only so many plate appearances. The designer has to find the least offensive way to put all that toothpaste back into the tube, and there won't be anything that's "right."

No, it's the fact that there can only be so many HRs in any given league (+/- a small amount) and everyone with power is fighting it out for those HRs. So when they hit way less than normal due to this, their average plummets because they aren't hitting singles or doubles to make up for the lost HRs.

Dropping from 50-60 HR to 20-30 is 30 lost hits a year, that's a big chunk of average for sluggers with eye.
__________________

Last edited by ubernoob; 07-04-2020 at 04:20 PM.
ubernoob is offline   Reply With Quote
Old 07-04-2020, 04:28 PM   #20
RonCo
Hall Of Famer
 
Join Date: Aug 2003
Posts: 9,499
Quote:
Originally Posted by ubernoob View Post
No, it's the fact that there can only be so many HRs in any given league (+/- a small amount) and everyone with power is fighting it out for those HRs. So when they hit way less than normal due to this, their average plummets because they aren't hitting singles or doubles to make up for the lost HRs.
Again, from the other thread, your assumption of distributing a set number of HR is not actually how the base engine works. I know it can appear that way, but I'm about as sure as a non-developer can be that--as long as the PT game engine is like the base game in its base function--you're not correct in that assessment.
RonCo is offline   Reply With Quote
Reply

Bookmarks


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 12:56 AM.

 

Major League and Minor League Baseball trademarks and copyrights are used with permission of Major League Baseball. Visit MLB.com and MiLB.com.

Officially Licensed Product – MLB Players, Inc.

Out of the Park Baseball is a registered trademark of Out of the Park Developments GmbH & Co. KG

Google Play is a trademark of Google Inc.

Apple, iPhone, iPod touch and iPad are trademarks of Apple Inc., registered in the U.S. and other countries.

COPYRIGHT © 2023 OUT OF THE PARK DEVELOPMENTS. ALL RIGHTS RESERVED.

 

Powered by vBulletin® Version 3.8.10
Copyright ©2000 - 2024, vBulletin Solutions, Inc.
Copyright © 2020 Out of the Park Developments