|
||||
| ||||
|
|||||||
| OOTP 26 - General Discussions Everything about the brand new 26th Anniversary Edition of Out of the Park Baseball - officially licensed by MLB, the MLBPA, KBO and the Baseball Hall of Fame. |
![]() |
|
|
Thread Tools |
|
|
#1 |
|
Bat Boy
Join Date: Jun 2025
Posts: 2
|
Batting Order research
I read Tango's book on vacation and got a wild hair to test his notions about batting order (widely taken as gospel in the baseball community), which he presents as a set of heuristics i.e. top three hitters go in 1, 2, and 4, etc. But he makes little mention of OBP, baserunning / stealing, bb/k, or other stats that we know are important in the batting order. Can we quantify the importance of each metric for each spot in the batting order and see if Tango's wisdom is right?
I pulled MLB game and play-by-play data from 2022-2024 (after the DH change), and for each team in each game, calculated the following metrics for the starters in each batting order slot (1-9) in that game: wOBA, ubr (ultimate baserunning), wGDP (avoids GDPs), OBP, SLG, wSB (weighted stolen bases), BBrate, BBK, and BABIP, then ran a regression with all those as the independent variables, and the team's runs in the game as the dependent variable. Results are below. Note that some coefficients are zero - that's because I ran a lasso algorithm that adjusts for multi-collinearity and excludes variables with no predictive power. ================================================== ============================ ========== R-squared: 0.835 Adj. R-squared: 0.835 No. Observations: 14833 ================================================== ============================ ========== coef (22-24) ---------- ---------- const -1.9413 woba_l1 2.3095 woba_l2 1.5956 woba_l3 1.8882 woba_l4 1.5724 woba_l5 2.4229 woba_l6 2.2028 woba_l7 2.7949 woba_l8 2.3715 woba_l9 2.6184 ubr1 0.2982 ubr2 0.3197 ubr3 0.3473 ubr4 0.3367 ubr5 0.3122 ubr6 0.3413 ubr7 0.3577 ubr8 0.3629 ubr9 0.3352 wGDP1 0.1421 wGDP2 0.1024 wGDP3 0.2283 wGDP4 0.1487 wGDP5 0.0837 wGDP6 0.1183 wGDP7 0.1538 wGDP8 0.1118 wGDP9 0.1297 OBP1 -1.1257 OBP2 -0.7473 OBP3 -0.9293 OBP4 -0.7294 OBP5 -1.2828 OBP6 -1.2923 OBP7 -1.7114 OBP8 -1.3609 OBP9 -1.5338 SLG1 0.4533 SLG2 0.7008 SLG3 0.7101 SLG4 0.8002 SLG5 0.3930 SLG6 0.4358 SLG7 0.2319 SLG8 0.3417 SLG9 0.1949 wSB1 -0.3763 wSB2 0.0000 wSB3 -0.3558 wSB4 0.0000 wSB5 0.0000 wSB6 0.0000 wSB7 0.0000 wSB8 -0.4314 wSB9 -0.3818 BBrate1 0.0000 BBrate2 0.0000 BBrate3 -0.3689 BBrate4 -0.5561 BBrate5 0.0000 BBrate6 0.0000 BBrate7 0.0000 BBrate8 0.0000 BBrate9 0.0000 BBK1 0.0690 BBK2 0.0000 BBK3 0.1177 BBK4 0.1725 BBK5 0.0000 BBK6 0.0000 BBK7 0.0516 BBK8 0.0000 BBK9 0.0000 BABIP_1 -0.3633 BABIP_2 -0.2989 BABIP_3 -0.3545 BABIP_4 -0.3829 BABIP_5 -0.3095 BABIP_6 -0.2576 BABIP_7 -0.2265 BABIP_8 -0.3095 BABIP_9 -0.1967 As you see we have an 83.5% R squared over 14,833 observations - not bad. Some of the results are intuitive, like SLG4 having the highest coefficient in the SLG category. Others make less sense at first glance. Take the first four slots for wOBA and OBP as an example: woba_l1 2.3095 woba_l2 1.5956 woba_l3 1.8882 woba_l4 1.5724 OBP1 -1.1257 OBP2 -0.7473 OBP3 -0.9293 OBP4 -0.7294 The OBP coefficients are negative, but this does NOT mean that we want a bad OBP guy at #1. It means that wOBA and OBP are correlated i.e. wOBA already captures much of OBP, so it gets a negative coefficient whereby we don't double-count its effect. The way I interpret the negative coefficients is that the more negative they are, the more it hurts to have a batter suck at that stat in that spot in the order. So we really do want a good OBP guy at #1. But have a look at OBP for slots 5-9: OBP5 -1.2828 OBP6 -1.2923 OBP7 -1.7114 OBP8 -1.3609 OBP9 -1.5338 Now it's not clear that our best OBP guy should be #1. OBP7 seems more important, and so does woba_l7, for that matter! I got this same pattern when using 2019-2024 data (i.e. three more years of data), so I don't think it's an anomaly - but I cannot explain it. Of course, wOBA and OBP are not the only variables here, and a significant chunk of a player's run value is explained by other stats. But even regressing just OBP and wOBA, there is something important about the 7-hole that defies intuition: OLS Regression Results with Lasso-selected and significant predictors (p <= 0.05): OLS Regression Results ================================================== ============================ Dep. Variable: b_r R-squared: 0.659 Model: OLS Adj. R-squared: 0.659 Method: Least Squares F-statistic: 2385. Date: Sun, 29 Jun 2025 Prob (F-statistic): 0.00 Time: 12:40:41 Log-Likelihood: -30028. No. Observations: 14833 AIC: 6.008e+04 Df Residuals: 14820 BIC: 6.018e+04 Df Model: 12 Covariance Type: nonrobust ================================================== ============================ coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const -3.9382 0.054 -73.297 0.000 -4.044 -3.833 OBP2 0.3557 0.151 2.359 0.018 0.060 0.651 OBP4 -0.3838 0.150 -2.560 0.010 -0.678 -0.090 OBP7 -0.4039 0.152 -2.650 0.008 -0.703 -0.105 woba_l1 3.2970 0.061 54.158 0.000 3.178 3.416 woba_l2 2.8710 0.133 21.583 0.000 2.610 3.132 woba_l3 3.2193 0.057 56.710 0.000 3.108 3.331 woba_l4 3.6385 0.132 27.504 0.000 3.379 3.898 woba_l5 3.0625 0.057 53.512 0.000 2.950 3.175 woba_l6 2.8208 0.058 48.538 0.000 2.707 2.935 woba_l7 3.1684 0.142 22.355 0.000 2.891 3.446 woba_l8 2.8598 0.058 49.614 0.000 2.747 2.973 woba_l9 2.6762 0.057 46.603 0.000 2.564 2.789 ================================================== ============================ Omnibus: 1225.014 Durbin-Watson: 1.958 Prob(Omnibus): 0.000 Jarque-Bera (JB): 2278.586 Skew: 0.581 Prob(JB): 0.00 Kurtosis: 4.529 Cond. No. 20.1 ================================================== ============================ Any idea what is going on here? Next steps are to extend this analysis back into further periods by excluding pitchers batting, and also looking at LHP/RHP splits and batter handedness. Like: is there data support for alternating L and R handed batters? In the meantime I am trialing these lineup findings with the Braves in OOTP. Will post an update here if I run more data or figure out what's up with the 7-spot. |
|
|
|
|
|
#2 |
|
Hall Of Famer
Join Date: Jun 2014
Location: Juust a bit outside...
Posts: 6,297
|
This is too smart for me. However, I will be absolutely fascinated if you post a dummy version
__________________
"Cannonball Coming!" Go Bucs!! Founder and League Caretaker of the Professional Baseball Circuit, www.probaseballcircuit.com An Un-Official Guide to Minor League Management in OOTP 21 Ratings Scale Conversion Cross-Reference Cheat Sheet |
|
|
|
|
|
#3 |
|
Global Moderator
Join Date: Nov 2002
Posts: 12,048
|
I'm not near the regression expert I wish I was, but there's enough odd stuff there that my guess is you did something wrong. What that is, I don't know, but maybe it was a simple data formatting error.
Like that r-squared really does seem pretty high, which would be great, but it's almost too good to be true. Then you've got wOBA 2-4 are lower than the rest when wouldn't you expect higher? OBA is negative which you've seemingly explained, but are you sure about that especially considering the wOBA numbers? Maybe try 1 set, but not the other? SLG5 even looks suspiciously low to me. I'm also surprised by how many 0.00s you have. I don't know. Others here like RonCo would know better, but yeah, I think you made a mistake somewhere.
__________________
My OOTP Wishlist | My FAQ List OOTP Wiki | Your Recommended Team Nicknames, By City (A Crowdsourced Project) For Beta/Devs: Full screen (1920x1080) |
|
|
|
|
|
#4 |
|
Bat Boy
Join Date: Jun 2025
Posts: 2
|
I just re-ran the regression using only OBP and SLG, and here the results look a lot different. OBP is clearly preferred in slots 1-2, and SLG at #4 (like wisdom says). R^2 is lower, but the anomaly at #7 disappears!
Dep. Variable: b_r R-squared: 0.659 Model: OLS Adj. R-squared: 0.658 ================================================== ============================ coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const -3.5564 0.057 -62.546 0.000 -3.668 -3.445 OBP1 1.9057 0.091 20.843 0.000 1.727 2.085 OBP2 1.9076 0.091 21.000 0.000 1.730 2.086 OBP3 1.6125 0.089 18.118 0.000 1.438 1.787 OBP4 1.6011 0.090 17.796 0.000 1.425 1.777 OBP5 1.5441 0.089 17.259 0.000 1.369 1.719 OBP6 1.3970 0.089 15.736 0.000 1.223 1.571 OBP7 1.3939 0.086 16.120 0.000 1.224 1.563 OBP8 1.4632 0.086 16.956 0.000 1.294 1.632 OBP9 1.4442 0.085 16.898 0.000 1.277 1.612 SLG1 0.9998 0.046 21.573 0.000 0.909 1.091 SLG2 0.9455 0.043 21.870 0.000 0.861 1.030 SLG3 1.1245 0.042 26.851 0.000 1.042 1.207 SLG4 1.1699 0.043 27.445 0.000 1.086 1.253 SLG5 1.0590 0.044 24.249 0.000 0.973 1.145 SLG6 0.9914 0.046 21.742 0.000 0.902 1.081 SLG7 0.9791 0.045 21.657 0.000 0.890 1.068 SLG8 0.9758 0.046 21.056 0.000 0.885 1.067 SLG9 0.8737 0.046 18.874 0.000 0.783 0.964 ================================================== ============================ Omnibus: 1151.939 Durbin-Watson: 1.966 Prob(Omnibus): 0.000 Jarque-Bera (JB): 2141.570 Skew: 0.552 Prob(JB): 0.00 Kurtosis: 4.498 Cond. No. 12.5 ================================================== ============================ But throw in wOBA and you get: coef std err t P>|t| [0.025 0.975] ------------------------------------------------------------------------------ const -3.7229 0.058 -63.888 0.000 -3.837 -3.609 OBP1 0.8611 0.284 3.035 0.002 0.305 1.417 OBP2 1.2661 0.250 5.061 0.000 0.776 1.756 OBP3 0.8055 0.233 3.462 0.001 0.349 1.262 OBP4 0.7066 0.240 2.947 0.003 0.237 1.177 OBP5 0.4469 0.267 1.673 0.094 -0.077 0.971 OBP6 0.5972 0.281 2.126 0.034 0.047 1.148 OBP7 -0.1157 0.272 -0.426 0.670 -0.648 0.417 OBP8 0.1656 0.279 0.593 0.553 -0.382 0.713 OBP9 0.5081 0.277 1.831 0.067 -0.036 1.052 SLG1 0.4663 0.147 3.176 0.001 0.179 0.754 SLG2 0.5999 0.132 4.531 0.000 0.340 0.859 SLG3 0.7007 0.119 5.867 0.000 0.467 0.935 SLG4 0.7054 0.124 5.684 0.000 0.462 0.949 SLG5 0.4691 0.139 3.372 0.001 0.196 0.742 SLG6 0.5526 0.149 3.715 0.000 0.261 0.844 SLG7 0.1781 0.142 1.257 0.209 -0.100 0.456 SLG8 0.2835 0.146 1.937 0.053 -0.003 0.570 SLG9 0.3922 0.141 2.773 0.006 0.115 0.669 woba_l1 1.7975 0.466 3.859 0.000 0.884 2.711 woba_l2 1.1207 0.408 2.750 0.006 0.322 1.920 woba_l3 1.3879 0.370 3.747 0.000 0.662 2.114 woba_l4 1.5492 0.385 4.020 0.000 0.794 2.305 woba_l5 1.9122 0.434 4.402 0.000 1.061 2.764 woba_l6 1.4129 0.462 3.061 0.002 0.508 2.318 woba_l7 2.6334 0.444 5.928 0.000 1.763 3.504 woba_l8 2.2601 0.457 4.946 0.000 1.364 3.156 woba_l9 1.6051 0.448 3.582 0.000 0.727 2.484 Now the oddness has returned. I think the main issue is that wOBA is kind of a synthesis of OBP and SLG, so having all three stats makes it hard to interpret any one of their set of coefficients in isolation. |
|
|
|
|
|
#5 |
|
Global Moderator
Join Date: Nov 2002
Posts: 12,048
|
Yeah, the top one looks a lot more like what I'd expect. Sometimes less is just better.
__________________
My OOTP Wishlist | My FAQ List OOTP Wiki | Your Recommended Team Nicknames, By City (A Crowdsourced Project) For Beta/Devs: Full screen (1920x1080) |
|
|
|
|
|
#6 |
|
All Star Starter
Join Date: Nov 2019
Posts: 1,183
|
You can crunch all the numbers you like. The human element will override any “gospel” on batting orders.
__________________
“Baseball isn’t statistics; it’s Joe DiMaggio rounding second.” “Once, centuries ago, it was the beloved national pastime of the Americas, Wesley. Abandoned by a society that prized fast food and faster games. Lost to impatience.” “ The term ‘WAR’ should be replaced by ‘WAG’. WAR isn’t an actual measurement; it’s just a wild-ass guess” -Bill James RIP National League 1876-2022 Floreat semper vel invita morte. I make custom ballparks. |
|
|
|
|
|
#7 | |
|
Minors (Single A)
Join Date: Jan 2017
Posts: 77
|
Quote:
Facts.
__________________
This Naps logo is mine but if you'd like to use it, feel free to ask. I have other fictional logos as well. |
|
|
|
|
![]() |
| Bookmarks |
| Thread Tools | |
|
|