|
||||
| ||||
|
|||||||
| OOTP 25 - General Discussions Everything about the brand new 25th Anniversary Edition of Out of the Park Baseball - officially licensed by MLB, the MLBPA, KBO and the Baseball Hall of Fame. |
![]() |
|
|
Thread Tools |
|
|
#1 |
|
Hall Of Famer
Join Date: May 2011
Posts: 3,123
|
quibbles and questions about OOTP WAR implementation
Hi, sorry for long post. I have a few questions about the way WAR is implemented in-game that I was hoping the developers (or others) might be able to comment on...
There has been some discussion on some of these points over the years, including this https://forums.ootpdevelopments.com/...d.php?t=243191 (from 10 years ago!) but many of the same questions remain... To be clear, I'm not questioning the utility of WAR, more just trying to piece together how OOTP calculates it. And would love to have more clarity from the developers on how they do this! And even better, have this show up in the manual or in a tooltip. To study this, I put together a fictional 30-team test league with DH, and otherwise out-of-the-box settings. I zeroed out all the parks to minimize the effects of park factors. I then simmed a season and collected the stats. The first thing I was interested in was whether OOTP accurately captured the split of WAR between position players and pitchers. This is defined at Fangraphs and Baseball Reference at 570 WAR for position players and 430 WAR for pitchers, adding up to 1000 WAR total across 30 teams. (This assumes a replacement-level winning percentage of .294, or 47.7 wins for a team comprised of entirely replacement-level players.). OOTP does decently well on this, as the league totals worked out to 573.4 WAR for position players and 447.0 fWAR for pitchers... a little high for the pitchers, but not too bad. One thing to note here is that the "Team Statistics" page actually inaccurately calculates (and grossly overestimates) pitching WAR- this page says that the teams added up to 583.1 fWAR and 588.9 rWAR, which I think is related to using a different replacement level- but I went back to the individual team summaries which had the correct sums of individually-listed fWARs. The second thing to note is that there are some errors in implementation of several baserunning and fielding metrics. I added up the totals of UBR and BsR for each team, and they worked out to -714.6 and -714.9 respectively- but they should be zero. Likewise, zone ratings for pitchers added up to a total of 74.9 and should be zero. Framing runs for catchers added up to 224.6 runs and arm runs for outfielders added up to 354.1 runs; both of those should be zero also. (Catching arm or blocking runs don't seem to be calculated at all, or at least are not listed.). By contrast, the sums of batting runs (-0.8), wRAA (13.6) and most other zone ratings were pretty close to zero, as they should be. The next thing I'll mention is that it's not entirely clear to me how wOBA or FIP are derived in-game. Both of these are somewhat complicated to calculate, requiring annual re-assessment of several constants based on the league's run environment. The league's wOBA should be defined as equal to the league's OBP. In this test, the league's OBP was .324, and so anything above a .324 wOBA should end up with positive wRAA, but the actual cut-point seemed to be around .318. The cut points for replacement-level FIP in this test seemed to be around 5.38 for starting pitchers and 4.53 for relievers. So the questions that come up for me here are, does OOTP use a "consistent" formula for calculating wOBA and FIP from WAR, or does it actually calculate the constants from year to year? I somewhat doubt it does the latter, but struggled to figure out what it's actually doing to figure these out. I tried stacking up the in-game results with a "default" wOBA formula [(0.7*(BB+HBP)+0.9*1B+1.25*2B+1.6*3B+2.0*HR)/PA] and also tried with the incorporation of intentional walks and sac flies, as well as a version based on 1.8*OBP+SLG, all of which got me close to the wOBA numbers the game uses, but not too close. Likewise for FIP- I can calculate a "FIP constant" to get the league FIP to equal the league ERA (as it's defined), but if I use that constant, it gets me close-ish to the in-game FIPs for individual players, but not exactly there. The other thing I wonder about is the effects of park factors on all this. I looked at one team, which ended up scoring and allowing 609 runs at home, and 706 runs on the road. Again, this was in a league where all park factors were neutralized. I don't know if OOTP makes adjustments to WAR based on the "theoretical" park factors (which would mean no adjustment in this specific case) or the "measured" park factors (which would mean position players for this team would get a boost for playing in a park that "penalizes" run scoring). Attaching the spreadsheet collecting the data here. So in summary, my questions are: -How does OOTP actually go about calculating wOBA, FIP and WAR? Specifically, are there constants that it uses, or does it re-calculate these each year? -Why is the pitching WAR listed on the "team statistics" pages different from the totals found on the individual team pages? And why is pitching WAR slightly higher than it "should" be? -Why are UBR/BsR, pitching ZR, catching framing runs and arm runs not zeroed out? -Does OOTP make adjustments based on "theoretical" park factors or "actual" park factors based on results? Any clarification would be helpful! Thanks- |
|
|
|
|
|
#2 |
|
Hall Of Famer
Join Date: May 2011
Posts: 3,123
|
Bump! Any ideas on this?
|
|
|
|
|
|
#3 |
|
Hall Of Famer
Join Date: Jun 2004
Posts: 4,371
|
I have been looking at Framing as well and it appears that the game is not tallying those results properly because there is always more positive than negative FRM in the league.
This is also occurring with ARM for outfielders too. I suspect that there is something missing in the evaluation of how the game is reporting those statistics. I am planning to report a bug again on those issues. bug about these was logged previously and some changes I think were made but the issue remains. Regarding catcher blocking and arm, those are actually wrapped up in their ZR value. Ideally league FIP should be an exact match to league ERA in any season. If they are using the shorter form of the FIP formula this will not work out but if FIP is being calculated correctly it will match the ERA of the league in any season. I think the game is adjusting by the park factors regarding WAR. I think more work needs to be done with UBR and BsR reporting. I think most likely that something is not being tallied or the formulas used for some of these statistics are not accounting for the league as a whole and the league is not averaging out to 0. The league of course does indeed average out to 0 as a whole in the game when you look at Runs per Game though. When we are looking at UBR, BsR, FRM, and ARM for a given player it is tough to tell what that really means in the game. A player could have a positive value but really be average in terms of results in the game, but it is being reported as above average for that particular value. |
|
|
|
|
|
#4 |
|
Hall Of Famer
Join Date: May 2011
Posts: 3,123
|
Thanks for your thoughtful response on this!
It's interesting that C arm and blocking runs are wrapped up in ZR. It does seem like there would be value in separating those out. It would be great if the developers could comment on how the game is actually calculating these things- specifically wOBA, FIP and WAR. While there's a strong correlation between in-game WAR and team wins (.57 for position player WAR and wins, .50 for pitcher WAR and wins, calculated in another study), it makes me skeptical of the validity of this implementation of WAR without knowing what goes into it. And of course, it would be great if they could zero out the things that should be zeroed out (baserunning, framing, outfield arm, pitcher ZR). |
|
|
|
|
|
#5 | |
|
Hall Of Famer
Join Date: May 2004
Posts: 10,668
|
I’m almost positive OOTPD has an agreement with FanGraphs mostly to use their projections as a basis for new season ratings every version but also to use their formulae to calculate WAR and so on.
__________________
Quote:
|
|
|
|
|
|
|
#6 |
|
Hall Of Famer
Join Date: May 2011
Posts: 3,123
|
Right, but the variables/linear weights for specific outcomes varies from year to year, as does the "FIP constant," so the "Fangraphs formula" is actually different from one year to the next. As in, the "value" of a single or a double in one year will be different (per Fangraphs) from another year depending on the run-scoring environment. And the FIP constant is used to get league FIP to equal league ERA, so also varies from year to year. And there are many others things that are unclear as well... Does OOTP WAR account for a different runs/win calculation from year to year? It's usually about 10, but can be wildly different in some leagues... Does OOTP WAR account for leverage index for relievers? If so, how? Does OOTP wOBA (and thus WAR) include hit by pitches, intentional walks or sacrifice flies? If so, how? I guess what I'm getting at, I'd love for the devs to show their work and let us know how these numbers are getting generated. It's a black box, and there are some pretty clear errors just on the surface.
|
|
|
|
|
|
#7 | |
|
Minors (Triple A)
Join Date: Jan 2006
Location: Whittier, CA
Posts: 212
|
Quote:
It really wasn't very difficult, I was able to calculate xFIP as well. I really wish the Dev's would look into adding xFIP and standardizing the FIP formula the game uses. Last edited by dodgerblue88; 07-13-2024 at 01:31 PM. |
|
|
|
|
|
|
#8 | ||
|
Hall Of Famer
Join Date: May 2004
Posts: 10,668
|
Quote:
To the rest of this, yes, it's a black box. FG doesn't release the formula because it's proprietary and they benefit from some of the work they do on it and I'm sure that part of the agreement OOTPD has with FG is that they don't/can't release the formula themselves. The only way you get this formula available to peruse is if OOTPD breaks the agreement and comes up with their own (which seems unlikely because I'm sure their agreement with FG includes verbiage about how they can't just make up their own formulae that are based on old ones) or, for instance, move on to someone who open-sources their formula like BBRef (which has its own issues, of course). And at that, at this point these things are so complicated nowadays that for the vast majority of fans, they won't have the understanding of math to parse it anyway. Check out some of the formulae that are out there - BoxRec's rating system, for instance (which I've plotted out a couple times for attempts at "managers" for Title Bout in the past) or Basketball-Reference's "win shares" (which is an absolutely horrendous piece of trash and a giant exercise in the Texas Sharpshooter Fallacy that's hidden behind complexity, but I digress). As a developer the way I figure my way through these things is to use actual inputs and see how they work in practice. I'm sure actual math people have more sophisticated methods and even my (crude) path is unavailable to the vast majority of people. What I do know is that neither bWAR nor fWAR use a linear weights based model because that is a nearly 50 year old method of thinking about baseball events that the community has long since moved past. My fear is that fWAR is or will start moving on to physics-based data, which OOTP of course doesn't use because that's not the model of the game. At present that doesn't *seem* to be the case but I guess we'll see...
__________________
Quote:
|
||
|
|
|
|
|
#9 |
|
Hall Of Famer
Join Date: May 2011
Posts: 3,123
|
Thanks for response!
Actually Fangraphs DOES still use linear weights, and their WAR formula is open source- all of the info for it is here: https://library.fangraphs.com/misc/war/ and the variables they use (for the linear weights and FIP constant) year by year are here: https://www.fangraphs.com/guts.aspx?type=cn You can debate whether this is the best construct or implementation of WAR (vs Baseball Reference, Baseball Prospectus or others) but I agree, the Fangraphs implementation is the framework that OOTP uses. And given the way that OOTP's engine works (based largely on DIPS), not to mention that they have historically pulled their projections from Fangraphs, this makes sense, though now that PBABIP varies from pitcher to pitcher, there's more reason to use a RA9-based WAR (a la Baseball Reference) than previously. My question is (in part) whether OOTP actually calculates these linear weights (which I doubt) vs just uses the same numbers every year. I get that the formulas are complicated, I'd just like some visibility about how they're doing it. It's a game based on stats, and we don't really know the specifics of how they're deriving the primary value-based stats even over a decade after WAR was first included in the game. |
|
|
|
|
|
#10 | |
|
Hall Of Famer
Join Date: May 2004
Posts: 10,668
|
My bad regarding linear weights!
__________________
Quote:
|
|
|
|
|
|
|
#11 |
|
OOTP Developer
Join Date: Jun 2009
Location: Here and there
Posts: 16,232
|
Our versions of WAR won't match 100%, sometimes due to the nature of how we track some data vs how those other sites do. Defense certainly will vary a lot more, but even something like our linear weights calculations may end up with slightly different constants than the other versions use. We do calculate them, but again sometimes the data will vary a little. Sometimes it's an AL vs NL split, or often it can be a case of how to handle DH vs no-DH.
As best we can, we try to get close to the other metrics. So where there is a published calculation and algorithm, we've done our best to get as close to that as possible. |
|
|
|
|
|
#12 | |||||
|
Global Moderator
Join Date: Nov 2002
Posts: 12,022
|
This thread is awesome! Thanks for starting it and including your work. I don't understand it all as much as I wish, but I'm happy to try to.
Quote:
Quote:
At the very least I'd think OOTP should copy the 1000 total number regardless of how it was split. I wonder what happens when we're playing a league that is not 30 teams however. Does it just scale down, like if it was a 20 team league the total would be 667 total WAR? Quote:
Quote:
And I agree, I really wish xFIP (and other newer/better ERA estimators) was in the game by now. I also wish RA9 in the game actually meant what it means nearly everywhere else on the internet, actual Runs Allowed per 9 and not Runners Allowed per 9. If they really want the latter, fine, but call it something else, like BRA9. And add HR% and H%, which like K% and BB%, both of which are in the game, are better than their /9 equivalents. Quote:
__________________
My OOTP Wishlist | My FAQ List OOTP Wiki | Your Recommended Team Nicknames, By City (A Crowdsourced Project) For Beta/Devs: Full screen (1920x1080) |
|||||
|
|
|
|
|
#13 | ||
|
OOTP Developer
Join Date: Jun 2009
Location: Here and there
Posts: 16,232
|
Quote:
Quote:
|
||
|
|
|
|
|
#14 |
|
Hall Of Famer
Join Date: May 2011
Posts: 3,123
|
Thanks for the response and glad I got your attention
That's good to know that the game calculates the constants each year. Is there a way to retrospectively figure out what constants the game is using, other than complicatedly running regression analyses?And perhaps more importantly, would it be possible to address some of the other oddities identified in the thread- specifically the baserunning metrics, pitcher ZR, framing runs and outfield arm runs not zeroing out, and also the inconsistency in reporting of WAR from the "team statistics" pages vs the individual team summaries? Also, I totally agree that having RA9 (runs allowed per 9) as a stat would be super helpful. Thanks for all the input here, very helpful. |
|
|
|
![]() |
| Bookmarks |
|
|