Home | Webstore
Latest News: OOTP 27 Buy Now - FHM 12 Available - OOTP Go! 27 Available

Out of the Park Baseball 27 Buy Now!

  

Go Back   OOTP Developments Forums > Out of the Park Baseball 25 > OOTP 25 - General Discussions

OOTP 25 - General Discussions Everything about the brand new 25th Anniversary Edition of Out of the Park Baseball - officially licensed by MLB, the MLBPA, KBO and the Baseball Hall of Fame.

Reply
 
Thread Tools
Old 07-07-2024, 10:53 AM   #1
jaa36
Hall Of Famer
 
jaa36's Avatar
 
Join Date: May 2011
Posts: 3,123
quibbles and questions about OOTP WAR implementation

Hi, sorry for long post. I have a few questions about the way WAR is implemented in-game that I was hoping the developers (or others) might be able to comment on...

There has been some discussion on some of these points over the years, including this https://forums.ootpdevelopments.com/...d.php?t=243191 (from 10 years ago!) but many of the same questions remain...

To be clear, I'm not questioning the utility of WAR, more just trying to piece together how OOTP calculates it. And would love to have more clarity from the developers on how they do this! And even better, have this show up in the manual or in a tooltip.

To study this, I put together a fictional 30-team test league with DH, and otherwise out-of-the-box settings. I zeroed out all the parks to minimize the effects of park factors. I then simmed a season and collected the stats.

The first thing I was interested in was whether OOTP accurately captured the split of WAR between position players and pitchers. This is defined at Fangraphs and Baseball Reference at 570 WAR for position players and 430 WAR for pitchers, adding up to 1000 WAR total across 30 teams. (This assumes a replacement-level winning percentage of .294, or 47.7 wins for a team comprised of entirely replacement-level players.). OOTP does decently well on this, as the league totals worked out to 573.4 WAR for position players and 447.0 fWAR for pitchers... a little high for the pitchers, but not too bad. One thing to note here is that the "Team Statistics" page actually inaccurately calculates (and grossly overestimates) pitching WAR- this page says that the teams added up to 583.1 fWAR and 588.9 rWAR, which I think is related to using a different replacement level- but I went back to the individual team summaries which had the correct sums of individually-listed fWARs.

The second thing to note is that there are some errors in implementation of several baserunning and fielding metrics. I added up the totals of UBR and BsR for each team, and they worked out to -714.6 and -714.9 respectively- but they should be zero. Likewise, zone ratings for pitchers added up to a total of 74.9 and should be zero. Framing runs for catchers added up to 224.6 runs and arm runs for outfielders added up to 354.1 runs; both of those should be zero also. (Catching arm or blocking runs don't seem to be calculated at all, or at least are not listed.). By contrast, the sums of batting runs (-0.8), wRAA (13.6) and most other zone ratings were pretty close to zero, as they should be.

The next thing I'll mention is that it's not entirely clear to me how wOBA or FIP are derived in-game. Both of these are somewhat complicated to calculate, requiring annual re-assessment of several constants based on the league's run environment. The league's wOBA should be defined as equal to the league's OBP. In this test, the league's OBP was .324, and so anything above a .324 wOBA should end up with positive wRAA, but the actual cut-point seemed to be around .318. The cut points for replacement-level FIP in this test seemed to be around 5.38 for starting pitchers and 4.53 for relievers.

So the questions that come up for me here are, does OOTP use a "consistent" formula for calculating wOBA and FIP from WAR, or does it actually calculate the constants from year to year? I somewhat doubt it does the latter, but struggled to figure out what it's actually doing to figure these out. I tried stacking up the in-game results with a "default" wOBA formula [(0.7*(BB+HBP)+0.9*1B+1.25*2B+1.6*3B+2.0*HR)/PA] and also tried with the incorporation of intentional walks and sac flies, as well as a version based on 1.8*OBP+SLG, all of which got me close to the wOBA numbers the game uses, but not too close. Likewise for FIP- I can calculate a "FIP constant" to get the league FIP to equal the league ERA (as it's defined), but if I use that constant, it gets me close-ish to the in-game FIPs for individual players, but not exactly there.

The other thing I wonder about is the effects of park factors on all this. I looked at one team, which ended up scoring and allowing 609 runs at home, and 706 runs on the road. Again, this was in a league where all park factors were neutralized. I don't know if OOTP makes adjustments to WAR based on the "theoretical" park factors (which would mean no adjustment in this specific case) or the "measured" park factors (which would mean position players for this team would get a boost for playing in a park that "penalizes" run scoring).

Attaching the spreadsheet collecting the data here.

So in summary, my questions are:
-How does OOTP actually go about calculating wOBA, FIP and WAR? Specifically, are there constants that it uses, or does it re-calculate these each year?
-Why is the pitching WAR listed on the "team statistics" pages different from the totals found on the individual team pages? And why is pitching WAR slightly higher than it "should" be?
-Why are UBR/BsR, pitching ZR, catching framing runs and arm runs not zeroed out?
-Does OOTP make adjustments based on "theoretical" park factors or "actual" park factors based on results?

Any clarification would be helpful! Thanks-
Attached Files
File Type: xlsx WAR test.xlsx (247.9 KB, 39 views)
jaa36 is offline   Reply With Quote
Old 07-12-2024, 08:33 PM   #2
jaa36
Hall Of Famer
 
jaa36's Avatar
 
Join Date: May 2011
Posts: 3,123
Bump! Any ideas on this?
jaa36 is offline   Reply With Quote
Old 07-12-2024, 09:42 PM   #3
Garlon
Hall Of Famer
 
Join Date: Jun 2004
Posts: 4,371
I have been looking at Framing as well and it appears that the game is not tallying those results properly because there is always more positive than negative FRM in the league.

This is also occurring with ARM for outfielders too.

I suspect that there is something missing in the evaluation of how the game is reporting those statistics.

I am planning to report a bug again on those issues. bug about these was logged previously and some changes I think were made but the issue remains.

Regarding catcher blocking and arm, those are actually wrapped up in their ZR value.

Ideally league FIP should be an exact match to league ERA in any season. If they are using the shorter form of the FIP formula this will not work out but if FIP is being calculated correctly it will match the ERA of the league in any season.

I think the game is adjusting by the park factors regarding WAR.

I think more work needs to be done with UBR and BsR reporting.

I think most likely that something is not being tallied or the formulas used for some of these statistics are not accounting for the league as a whole and the league is not averaging out to 0. The league of course does indeed average out to 0 as a whole in the game when you look at Runs per Game though.

When we are looking at UBR, BsR, FRM, and ARM for a given player it is tough to tell what that really means in the game. A player could have a positive value but really be average in terms of results in the game, but it is being reported as above average for that particular value.
Garlon is offline   Reply With Quote
Old 07-12-2024, 11:24 PM   #4
jaa36
Hall Of Famer
 
jaa36's Avatar
 
Join Date: May 2011
Posts: 3,123
Thanks for your thoughtful response on this!

It's interesting that C arm and blocking runs are wrapped up in ZR. It does seem like there would be value in separating those out.

It would be great if the developers could comment on how the game is actually calculating these things- specifically wOBA, FIP and WAR. While there's a strong correlation between in-game WAR and team wins (.57 for position player WAR and wins, .50 for pitcher WAR and wins, calculated in another study), it makes me skeptical of the validity of this implementation of WAR without knowing what goes into it. And of course, it would be great if they could zero out the things that should be zeroed out (baserunning, framing, outfield arm, pitcher ZR).
jaa36 is offline   Reply With Quote
Old 07-13-2024, 11:01 AM   #5
Syd Thrift
Hall Of Famer
 
Syd Thrift's Avatar
 
Join Date: May 2004
Posts: 10,668
I’m almost positive OOTPD has an agreement with FanGraphs mostly to use their projections as a basis for new season ratings every version but also to use their formulae to calculate WAR and so on.
__________________
Quote:
Originally Posted by Markus Heinsohn
You bastard....
The Great American Baseball Thrift Book - Like reading the Sporting News from back in the day, only with fake players. REAL LIFE DRAMA THOUGH maybe not
Syd Thrift is offline   Reply With Quote
Old 07-13-2024, 01:15 PM   #6
jaa36
Hall Of Famer
 
jaa36's Avatar
 
Join Date: May 2011
Posts: 3,123
Right, but the variables/linear weights for specific outcomes varies from year to year, as does the "FIP constant," so the "Fangraphs formula" is actually different from one year to the next. As in, the "value" of a single or a double in one year will be different (per Fangraphs) from another year depending on the run-scoring environment. And the FIP constant is used to get league FIP to equal league ERA, so also varies from year to year. And there are many others things that are unclear as well... Does OOTP WAR account for a different runs/win calculation from year to year? It's usually about 10, but can be wildly different in some leagues... Does OOTP WAR account for leverage index for relievers? If so, how? Does OOTP wOBA (and thus WAR) include hit by pitches, intentional walks or sacrifice flies? If so, how? I guess what I'm getting at, I'd love for the devs to show their work and let us know how these numbers are getting generated. It's a black box, and there are some pretty clear errors just on the surface.
jaa36 is offline   Reply With Quote
Old 07-13-2024, 01:29 PM   #7
dodgerblue88
Minors (Triple A)
 
dodgerblue88's Avatar
 
Join Date: Jan 2006
Location: Whittier, CA
Posts: 212
Quote:
Originally Posted by jaa36 View Post
-How does OOTP actually go about calculating wOBA, FIP and WAR? Specifically, are there constants that it uses, or does it re-calculate these each year?
I took the Fangraphs FIP formula for both FIP and the constant and I calculated FIP for my fictional league within Power BI. I can confirm that it differed from the game was using , in most cases very little, but in some it varied by over more than half a run. I checked my formula for accuracy by calculating FIP for a historical league and matching the results to the players fangraph page.

It really wasn't very difficult, I was able to calculate xFIP as well. I really wish the Dev's would look into adding xFIP and standardizing the FIP formula the game uses.

Last edited by dodgerblue88; 07-13-2024 at 01:31 PM.
dodgerblue88 is offline   Reply With Quote
Old 07-14-2024, 11:39 AM   #8
Syd Thrift
Hall Of Famer
 
Syd Thrift's Avatar
 
Join Date: May 2004
Posts: 10,668
Quote:
Originally Posted by jaa36 View Post
Right, but the variables/linear weights for specific outcomes varies from year to year, as does the "FIP constant," so the "Fangraphs formula" is actually different from one year to the next. As in, the "value" of a single or a double in one year will be different (per Fangraphs) from another year depending on the run-scoring environment. And the FIP constant is used to get league FIP to equal league ERA, so also varies from year to year. And there are many others things that are unclear as well... Does OOTP WAR account for a different runs/win calculation from year to year? It's usually about 10, but can be wildly different in some leagues... Does OOTP WAR account for leverage index for relievers? If so, how? Does OOTP wOBA (and thus WAR) include hit by pitches, intentional walks or sacrifice flies? If so, how? I guess what I'm getting at, I'd love for the devs to show their work and let us know how these numbers are getting generated. It's a black box, and there are some pretty clear errors just on the surface.
I'm pretty positive that OOTP WAR for relievers does take leverage into account because a. leverage is tracked in the game and b. it's part of the FG formula.

To the rest of this, yes, it's a black box. FG doesn't release the formula because it's proprietary and they benefit from some of the work they do on it and I'm sure that part of the agreement OOTPD has with FG is that they don't/can't release the formula themselves. The only way you get this formula available to peruse is if OOTPD breaks the agreement and comes up with their own (which seems unlikely because I'm sure their agreement with FG includes verbiage about how they can't just make up their own formulae that are based on old ones) or, for instance, move on to someone who open-sources their formula like BBRef (which has its own issues, of course).

And at that, at this point these things are so complicated nowadays that for the vast majority of fans, they won't have the understanding of math to parse it anyway. Check out some of the formulae that are out there - BoxRec's rating system, for instance (which I've plotted out a couple times for attempts at "managers" for Title Bout in the past) or Basketball-Reference's "win shares" (which is an absolutely horrendous piece of trash and a giant exercise in the Texas Sharpshooter Fallacy that's hidden behind complexity, but I digress). As a developer the way I figure my way through these things is to use actual inputs and see how they work in practice. I'm sure actual math people have more sophisticated methods and even my (crude) path is unavailable to the vast majority of people.

What I do know is that neither bWAR nor fWAR use a linear weights based model because that is a nearly 50 year old method of thinking about baseball events that the community has long since moved past. My fear is that fWAR is or will start moving on to physics-based data, which OOTP of course doesn't use because that's not the model of the game. At present that doesn't *seem* to be the case but I guess we'll see...
__________________
Quote:
Originally Posted by Markus Heinsohn
You bastard....
The Great American Baseball Thrift Book - Like reading the Sporting News from back in the day, only with fake players. REAL LIFE DRAMA THOUGH maybe not
Syd Thrift is offline   Reply With Quote
Old 07-14-2024, 12:41 PM   #9
jaa36
Hall Of Famer
 
jaa36's Avatar
 
Join Date: May 2011
Posts: 3,123
Thanks for response!

Actually Fangraphs DOES still use linear weights, and their WAR formula is open source- all of the info for it is here: https://library.fangraphs.com/misc/war/ and the variables they use (for the linear weights and FIP constant) year by year are here: https://www.fangraphs.com/guts.aspx?type=cn

You can debate whether this is the best construct or implementation of WAR (vs Baseball Reference, Baseball Prospectus or others) but I agree, the Fangraphs implementation is the framework that OOTP uses. And given the way that OOTP's engine works (based largely on DIPS), not to mention that they have historically pulled their projections from Fangraphs, this makes sense, though now that PBABIP varies from pitcher to pitcher, there's more reason to use a RA9-based WAR (a la Baseball Reference) than previously.

My question is (in part) whether OOTP actually calculates these linear weights (which I doubt) vs just uses the same numbers every year. I get that the formulas are complicated, I'd just like some visibility about how they're doing it. It's a game based on stats, and we don't really know the specifics of how they're deriving the primary value-based stats even over a decade after WAR was first included in the game.
jaa36 is offline   Reply With Quote
Old 07-14-2024, 12:59 PM   #10
Syd Thrift
Hall Of Famer
 
Syd Thrift's Avatar
 
Join Date: May 2004
Posts: 10,668
My bad regarding linear weights!
__________________
Quote:
Originally Posted by Markus Heinsohn
You bastard....
The Great American Baseball Thrift Book - Like reading the Sporting News from back in the day, only with fake players. REAL LIFE DRAMA THOUGH maybe not
Syd Thrift is offline   Reply With Quote
Old 07-15-2024, 04:43 AM   #11
Matt Arnold
OOTP Developer
 
Matt Arnold's Avatar
 
Join Date: Jun 2009
Location: Here and there
Posts: 16,232
Our versions of WAR won't match 100%, sometimes due to the nature of how we track some data vs how those other sites do. Defense certainly will vary a lot more, but even something like our linear weights calculations may end up with slightly different constants than the other versions use. We do calculate them, but again sometimes the data will vary a little. Sometimes it's an AL vs NL split, or often it can be a case of how to handle DH vs no-DH.

As best we can, we try to get close to the other metrics. So where there is a published calculation and algorithm, we've done our best to get as close to that as possible.
Matt Arnold is offline   Reply With Quote
Old 07-15-2024, 06:32 AM   #12
kq76
Global Moderator
 
kq76's Avatar
 
Join Date: Nov 2002
Posts: 12,022
This thread is awesome! Thanks for starting it and including your work. I don't understand it all as much as I wish, but I'm happy to try to.

Quote:
Originally Posted by jaa36 View Post
And would love to have more clarity from the developers on how they do this! And even better, have this show up in the manual or in a tooltip.
Agreed!

Quote:
Originally Posted by jaa36 View Post
The first thing I was interested in was whether OOTP accurately captured the split of WAR between position players and pitchers. This is defined at Fangraphs and Baseball Reference at 570 WAR for position players and 430 WAR for pitchers, adding up to 1000 WAR total across 30 teams. (This assumes a replacement-level winning percentage of .294, or 47.7 wins for a team comprised of entirely replacement-level players.). OOTP does decently well on this, as the league totals worked out to 573.4 WAR for position players and 447.0 fWAR for pitchers... a little high for the pitchers, but not too bad. One thing to note here is that the "Team Statistics" page actually inaccurately calculates (and grossly overestimates) pitching WAR- this page says that the teams added up to 583.1 fWAR and 588.9 rWAR, which I think is related to using a different replacement level- but I went back to the individual team summaries which had the correct sums of individually-listed fWARs.
I wonder, would 2-way players affect this in any way?

At the very least I'd think OOTP should copy the 1000 total number regardless of how it was split.

I wonder what happens when we're playing a league that is not 30 teams however. Does it just scale down, like if it was a 20 team league the total would be 667 total WAR?

Quote:
Originally Posted by jaa36 View Post
I added up the totals of UBR and BsR for each team, and they worked out to -714.6 and -714.9 respectively- but they should be zero. Likewise, zone ratings for pitchers added up to a total of 74.9 and should be zero.
I knew it! I was convinced something was crazy off about the numbers I was seeing, but I didn't know how to prove it. Thanks for doing it.

Quote:
Originally Posted by dodgerblue88 View Post
I took the Fangraphs FIP formula for both FIP and the constant and I calculated FIP for my fictional league within Power BI. I can confirm that it differed from the game was using , in most cases very little, but in some it varied by over more than half a run. I checked my formula for accuracy by calculating FIP for a historical league and matching the results to the players fangraph page.

It really wasn't very difficult, I was able to calculate xFIP as well. I really wish the Dev's would look into adding xFIP and standardizing the FIP formula the game uses.
Would you mind sharing your spreadsheet as well?

And I agree, I really wish xFIP (and other newer/better ERA estimators) was in the game by now.

I also wish RA9 in the game actually meant what it means nearly everywhere else on the internet, actual Runs Allowed per 9 and not Runners Allowed per 9. If they really want the latter, fine, but call it something else, like BRA9.

And add HR% and H%, which like K% and BB%, both of which are in the game, are better than their /9 equivalents.

Quote:
Originally Posted by Matt Arnold View Post
Our versions of WAR won't match 100%, sometimes due to the nature of how we track some data vs how those other sites do. Defense certainly will vary a lot more, but even something like our linear weights calculations may end up with slightly different constants than the other versions use. We do calculate them, but again sometimes the data will vary a little. Sometimes it's an AL vs NL split, or often it can be a case of how to handle DH vs no-DH.

As best we can, we try to get close to the other metrics. So where there is a published calculation and algorithm, we've done our best to get as close to that as possible.
This sounds like you're saying the game recalculates the constants and coefficients every year based on the league we're playing, but can you confirm that?
kq76 is offline   Reply With Quote
Old 07-15-2024, 06:38 AM   #13
Matt Arnold
OOTP Developer
 
Matt Arnold's Avatar
 
Join Date: Jun 2009
Location: Here and there
Posts: 16,232
Quote:
Originally Posted by kq76 View Post
I wonder, would 2-way players affect this in any way?

At the very least I'd think OOTP should copy the 1000 total number regardless of how it was split.

I wonder what happens when we're playing a league that is not 30 teams however. Does it just scale down, like if it was a 20 team league the total would be 667 total WAR?
We don't adjust the totals, but it just naturally adjusts. If your league size is 2/3 of MLB, and your league schedule length was 80 instead of 162, then you'd just naturally accumulate around 333 war. It might be more if your league is somehow skewed.
Quote:

This sounds like you're saying the game recalculates the constants and coefficients every year based on the league we're playing, but can you confirm that?
Each league year, depending on the totals, calculates different constants. So internally we more or less create a table like the Fangraphs guts table (https://www.fangraphs.com/guts.aspx?type=cn) for calculating wOBA/WAR/FIP/etc..
Matt Arnold is offline   Reply With Quote
Old 07-15-2024, 12:49 PM   #14
jaa36
Hall Of Famer
 
jaa36's Avatar
 
Join Date: May 2011
Posts: 3,123
Thanks for the response and glad I got your attention That's good to know that the game calculates the constants each year. Is there a way to retrospectively figure out what constants the game is using, other than complicatedly running regression analyses?

And perhaps more importantly, would it be possible to address some of the other oddities identified in the thread- specifically the baserunning metrics, pitcher ZR, framing runs and outfield arm runs not zeroing out, and also the inconsistency in reporting of WAR from the "team statistics" pages vs the individual team summaries?

Also, I totally agree that having RA9 (runs allowed per 9) as a stat would be super helpful.

Thanks for all the input here, very helpful.
jaa36 is offline   Reply With Quote
Reply

Bookmarks


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 06:34 PM.

 

Major League and Minor League Baseball trademarks and copyrights are used with permission of Major League Baseball. Visit MLB.com and MiLB.com.

Officially Licensed Product – MLB Players, Inc.

Out of the Park Baseball is a registered trademark of Out of the Park Developments GmbH & Co. KG

Google Play is a trademark of Google Inc.

Apple, iPhone, iPod touch and iPad are trademarks of Apple Inc., registered in the U.S. and other countries.

COPYRIGHT © 2023 OUT OF THE PARK DEVELOPMENTS. ALL RIGHTS RESERVED.

 

Powered by vBulletin® Version 3.8.10
Copyright ©2000 - 2026, vBulletin Solutions, Inc.
Copyright © 2024 Out of the Park Developments