I suspect most of you are also following
this thread on the same topic, but for those who aren't I thought you'd be interested in my calibration report for this from the agent.
Here's my post, check the original thread for context if you feel the urge.
From ChatGPT...
I kept the full file intact, including the two-way players, and then calculated the league environment three ways: with no PA cutoff at all, with a 50 PA minimum for a cleaner player pool, and with a 100 PA minimum as an even more stable reference point. That gives you both the pure league environment and the cleaner signal once the smallest samples are stripped out.
League environment
With no cutoff, the file contains 503 players and 152,819 total PA. The PA-weighted league-average line comes out to .3363 wOBA against .3135 xwOBA, .3426 OBP, .2745 AVG, .4360 SLG, .3035 BABIP, 100.4 wRC+, 101.4 OPS+, 8.83% BB, 15.63% K, 28.80% chase rate, 77.82% Z-Contact, 29.24% whiff rate, 83.78 EV, 114.60 max EV, 10.85 launch angle, 7.43% barrel rate, 23.71% hard-hit rate, and 11.78% HR/FB.
At 50 PA and above, the pool drops to 375 players and 150,769 PA, which means you are barely losing any plate appearances and mostly just removing noise. The weighted league line becomes .3370 wOBA against .3140 xwOBA, .3432 OBP, .2750 AVG, .4370 SLG, .3040 BABIP, 100.8 wRC+, 101.8 OPS+, 8.86% BB, 15.63% K, 28.77% chase, 77.81% Z-Contact, 29.24% whiff, 83.82 EV, 114.71 max EV, 10.85 launch angle, 7.47% barrel rate, 23.79% hard-hit rate, and 11.85% HR/FB.
At 100 PA and above, the numbers barely move again. The weighted line is .3377 wOBA against .3146 xwOBA, .3438 OBP, .2754 AVG, .4383 SLG, .3042 BABIP, 101.3 wRC+, 102.3 OPS+, 8.88% BB, 15.60% K, 28.76% chase, 77.84% Z-Contact, 29.22% whiff, 83.87 EV, 114.78 max EV, 10.89 launch angle, 7.52% barrel rate, 23.90% hard-hit rate, and 11.92% HR/FB.
The important point is that the environment is very stable once you remove the tiniest samples. That is a good sign in itself.
Actual versus expected tracking
This is where the file gets more interesting. Leaguewide, actual production is running ahead of expected production.
With no cutoff, league wOBA is .3363 versus .3135 xwOBA, so the weighted gap is +.0228. At 50 PA and above, the gap is still +.0230. At 100 PA and above, it is +.0231. That consistency matters. This is not a tiny-sample artifact. The sim appears to be producing a league where actual wOBA sits about twenty-three points above xwOBA.
By contrast, slugging is almost perfectly aligned. League SLG is .4360 versus .4309 xSLG with no cutoff, a gap of only +.0050. At 50 PA and above the gap is +.0053, and at 100 PA and above it is +.0053 again. That is very tight.
Batting average also runs a little hot, but not wildly so. AVG is .2745 versus .2672 xBA with no cutoff, a gap of +.0074. At 50 PA and above it is +.0076, and at 100 PA and above it is +.0080.
So if you are looking for the headline takeaway, it is this: the sim’s expected slugging and expected batting average are tracking fairly tightly to the realised league environment, but xwOBA is running meaningfully below actual wOBA at the league level.
Player-level spread
Using the 50 PA pool, the average player-level wOBA minus xwOBA gap is +.0203, and the median is +.0230. The 10th percentile is about -.019 and the 90th percentile is about +.0596. More importantly, 54.4% of players are more than +.020 above xwOBA, while only 9.6% are more than -.020 below it. That is a very lopsided distribution. If the expected model were fully centred, you would normally expect something more balanced around zero.
For SLG minus xSLG in the 50 PA pool, the average gap is only +.0022 and the median is +.0060. The 10th percentile is -.066 and the 90th percentile is +.0706. That looks much more naturally centred. There is real variance, but not an obvious directional bias.
For AVG minus xBA, the average gap is +.0023 and the median is +.0030. The 10th percentile is -.0396 and the 90th percentile is +.0410. Again, that looks much healthier than the xwOBA distribution.
Relationship tracking
This is the other major realism check. I focused on the 50 PA pool here, because it gives you cleaner player-to-player signal.
wOBA versus xwOBA has a correlation of 0.663. SLG versus xSLG is 0.689. AVG versus xBA is 0.549. EV versus xwOBA is 0.673. Barrel% versus SLG is 0.574. BB% versus OBP is 0.597. K% versus Z-Contact is -0.868. Chase rate versus BB% is -0.477.
Those are actually pretty encouraging. The stats are behaving in the right directions, and in most cases with healthy strength. The strongest relationship in the set is K% versus Z-Contact, which is exactly what you would want to see. EV and barrel rate are also mapping sensibly to damage production. So the internal logic of the metrics looks broadly baseball-like.
What this suggests
My read is that the simulation’s modern stats do not look random or decorative. They are interacting in coherent ways. Contact quality is mapping to power. Zone contact is strongly tied to strikeout rate. Chase is negatively tied to walks. Expected slugging is close to realised slugging. Those are all good signs.
The main red flag is xwOBA. On a leaguewide weighted basis it is consistently about twenty-three points lower than actual wOBA, and the player-level deltas are skewed heavily toward overperformance. That suggests either the xwOBA model is too conservative, or the sim is generating enough non-contact or sequencing value that wOBA is systematically outrunning the expected version. Either way, that is the one metric in this file that looks least well calibrated.
The practical verdict
If I were grading this file purely on realism behaviour, I would say the contact-quality and shape metrics look broadly believable, xSLG looks pretty well calibrated, xBA looks acceptable, and xwOBA is the main area that appears biased low relative to realised outcomes.
So the best single sentence summary is this: the stat ecosystem looks internally coherent, but the xwOBA model appears to be understating leaguewide offensive value.