|
||||
| ||||
|
|||||||
| OOTP 25 - General Discussions Everything about the brand new 25th Anniversary Edition of Out of the Park Baseball - officially licensed by MLB, the MLBPA, KBO and the Baseball Hall of Fame. |
![]() |
|
|
Thread Tools |
|
|
#1 |
|
All Star Starter
Join Date: May 2022
Posts: 1,268
|
The effect of pitcher BABIP on player evaluations in historical sims
This post concerns historical sims and the effects of the Pitcher BABIP rating on how the game evaluates of players (or at least presents its evaluation).
I'll try to keep this short... The existence of the Pitcher BABIP rating undermines the rationale of the modern defensive metrics that the game uses to evaluate players. This normally reveals itself in the fielding WAR portion of the position player evaluation, and the entirety of the WAR evaluation for pitchers. I am not saying this is BAD. The huge benefit of Pitcher BABIP is that the historical sims are more accurate when compared to the historical stat lines for pitchers. For most historical players, I assume this is a huge plus. Please do not construe this post in any way to be lobbying for the removal of Pitcher BABIP. I think a better approach would be to incorporate it more accurately in areas where it seems to be ignored. However, in historical sims players change teams in ways that they did not in the real world. A mediocre defender will see his defensive performance greatly improve when playing with pitchers with a low BABIP. This is because, unlike the modern game, it is the pitcher who is driving a significant portion of the defensive outcomes, not the fielders. What does this mean? High-quality defenders will be overrated and low quality defenders will be underrated by the AI. You might see, for example, a second baseman near the top of the various lists in assists, putouts, double plays, etc and then see that the game assigns him a significant defensive WAR. All-Star selections and post-season awards are the significantly impacted -- so I basically assign all of the post-season awards except Silver Slugger (which is purely offensive and unaffected). Pitcher WAR in the game is based on FIP, which makes perfect sense for the modern game. But because of the Pitcher BABIP rating, pitching is not fielding-independent in the historical game. As a result All-Star selections and post-season awards can be especially wacky in the game. I've just accepted the All-Star selections because it's not worth the effort to deal with. Anyway, as someone who plays a lot of historical sims, I just wanted to share these observations. Last edited by uruguru; 01-01-2025 at 07:09 PM. |
|
|
|
|
|
#2 |
|
Hall Of Famer
Join Date: Nov 2005
Posts: 3,120
Infractions: 0/1 (1)
|
This sounds more like a criticism of the defensive metrics in the game and how they are used than the PBABIP rating/usage?
|
|
|
|
|
|
#3 | |
|
All Star Starter
Join Date: May 2022
Posts: 1,268
|
Quote:
It's not a criticism of anything. I think the defensive metrics (and ratings) may be very useful in the non-historical sims, but they are far less valuable in historical sims. So any OOTP players of historical sims should understand that, for example, trading for Brooks Robinson or Mark Belanger or, I dunno, Cesar Tovar will not give you the defensive benefits you hope for. But they will probably still win the Gold Glove even if their defensive performance is only average because your pitchers have poor BABIP ratings. |
|
|
|
|
|
|
#4 | |
|
Hall Of Famer
Join Date: Nov 2005
Posts: 3,120
Infractions: 0/1 (1)
|
Quote:
If the game is incorporating PBABIP, but not factoring that into fielding stats...then that is a hole in the fielding stats (I mean, fielding stats based on raw fielding statistics have always been pretty bad). I don't think the game is or has significantly lessened the impact of good vs. mediocre/poor fielders by giving some outlier pitchers a little control. I guess maybe if you could provide some specific examples of how you came to your conclusions it may help your point? Last edited by Rain King; 01-01-2025 at 08:31 PM. |
|
|
|
|
|
|
#5 |
|
Hall Of Famer
Join Date: Jun 2004
Posts: 4,256
|
This is not true and not how the fielding ratings are handled in the game.
For example, in my current game which started in 1871 and is in the 2004 season, Mark Belanger won 13 Gold Gloves and Brooks Robinson was 10 Gold Gloves and did not play for the Orioles with that pitching staff. You should look at EFF, which is defensive efficiency, when looking at fielding results. Players with a 1.000 EFF are league average are not going to accumulate ZR being league average regardless of how many PO or Assists they accumulate. If anything, being on a pitching staff with low strikeouts will drive up PO and Assists, but that still has nothing to do with EFF, which is the rate relative to the league average at that position of turning a ball in play at your position into an out. In my current game Tovar had 1.073 EFF combined across LF/CF/RF, Belanger had a 1.083 EFF at SS (300 ZR), and Brooks had a 1.042 EFF at 3B. There are times when you can have two SS who both have say a 1.05 EFF, but different ZR totals because difficulty of the plays and also base-out situations will be different for those two defenders which will factor into the ZR evaluation. You are also greatly overthinking the effect and strength of the pitcher BABIP ratings which are extremely subtle. Striking batters out, which prevents a ball in play entirely, is always going to significantly outweigh a few points of pBABIP in terms of run prevention. Consider a league where the average team allows 1300 hits on balls in play with a .300 BABIP. Team A strikes out 100 more batters than average and plays average defense. Team B strikes out 100 fewer batters than average. If team A strikes out 100 more batters than average, then they have already saved 30 hits against them by not allowing the ball to be put in play an extra 100 times (100 * 0.300 = 30). If team B strikes out 100 fewer batters than average, then they are going to allow 30 more hits than average over the season if they play average defense, simply because there are more balls in play against them. How good does the defense of Team B need to be in order to make up the difference of 60 hits allowed between them and team A? If there is an average of 1300 hits on balls in play per team, then team A with average defense will allow 1270 and team B will allow 1330. For team B to turn that 1330 into 1270, they will need to have an average defensive efficiency at each position of 1330/1270 = 1.047. If the league BABIP is .300, then the league DEF is .700 (1.000 - 0.300 = 0.700). Their team DEF would need to be .700 * 1.047 = .733. This would be elite defense. We are not even getting into BB allowed either, which also effects run prevention. Last edited by Garlon; 01-01-2025 at 08:59 PM. |
|
|
|
|
|
#6 | |
|
All Star Starter
Join Date: May 2022
Posts: 1,268
|
Quote:
I currently have a centerfielder who is average, at best, defensively, yet he consistently is near the top of the league in putouts. In order words, his defensive range in the sim (the pct of balls he converts to outs) is often higher than other CF who have a much higher outfield range. The only conclusion I can draw from this is because I have a staff that is deep in pitchers with a low BABIP. In other words, their pitching BABIP rating is forcing more batted balls into outs than other pitchers. Which means that the effective range of my fielders is greater than what you would expect from their ratings alone. What has to be true to some degree, otherwise there would be no point in the pitcher BABIP rating. However, when I look at my awesomely fielding centerfielder with average ratings, I can see that OOTP consistently gives him a negative fielding WAR... and higher rated centerfielders who catch fewer balls consistently have positive fielding WARS and win Gold Gloves. Why is that? Without access to OOTP's formula for calculating fielding WAR, I can only assume it is biased towards difficult-to-catch balls where fielding ratings still make a difference. But I don't obviously know that, I only know what I see in the sim. And what I see is fielders overperforming their defensive ratings, often significantly, when they are playing with low BABIP pitchers. Last edited by uruguru; 01-01-2025 at 09:19 PM. |
|
|
|
|
|
|
#7 |
|
Hall Of Famer
Join Date: Jun 2004
Posts: 4,256
|
Please post a screen shot of your CF defensive statistics, fielding stats>career fielding stats. Also provide their OF Range/Error/Arm ratings on the 100 scale.
Just because a player is near the top in total PO in CF does not mean they are overperforming. Your pitching staff may be low on strikeouts and high in flyball rate which drives up the raw totals. |
|
|
|
|
|
#8 |
|
Hall Of Famer
Join Date: Nov 2005
Posts: 3,120
Infractions: 0/1 (1)
|
The effective range isn't better for your center fielder. They just get more easy balls. Their raw number does not represent their range. I'm not sure where the term "effective range" comes from?
The game tracks plays in the following categories. Routine, Likely, Even (i.e. 50/50 plays), Unlikely, and Impossible. There is a lot more value per play as you move to the right of that spectrum. |
|
|
|
|
|
#9 |
|
Hall Of Famer
Join Date: Jun 2004
Posts: 4,256
|
There is a known issue with the outfielder ARM statistic not being tracked accurately and this can sometimes cause an incorrect GG winner to be selected in the OF.
|
|
|
|
|
|
#10 | |
|
All Star Starter
Join Date: May 2022
Posts: 1,268
|
A
Quote:
Look at 2 guys in the attached image... Glenn Beckert and Felix Millan. OOTP wants to give the Gold Glove to Millan. Look at the Range stat first: Beckert had the highest range (actual successfully fielded balls) per 9 innings of any 2B. It's not even close. He was 6.25 and then there was a clump of fielders from 5.60 downward, starting with Millan. Now look at Zone Rating: Millan has an amazing 18.0 ZR, but where is Beckert? It's not on the list because it's negative. And not just barely negative, but disastrously so... -9.8 ZR probably one of the worst of every starting 2B. You can't really chalk up the difference to a high-K vs a low-K staff because there was only a 50 strikeout difference between their two pitching staffs over the whole season. To illustrate that point further, Davey Johnson had almost the same range as Millan (5.55 to 5.60), yet his team struck out 130+ more batters than Millan's team, making his chances that less common. And yet he also had a -5.7 ZR. Like Beckert, OOTP has somehow deemed his defensive performance below average. Here are the average SP Babip for Millan and Beckert's teams: Beckert 45.1 Millan 50.2 This is a surprise! Beckert had the greatest actual range of fielded balls for 2B (6.25 per 9IP) despite his pitching staff having the worst BABIP of the three staffs! And remember, worse BABIP means more batted balls become hits. And yet OOTP has assigned to him possibly the worst ZR of any 2B. So maybe the problem is the defensive metrics (your first thought). My 2B has been getting the same treatment as Beckert (negative ZR despite good on-field performance) but I assumed it was because I stockpiled a staff with high BABIP ratings. Here are their range and turn DP ratings (1-100 scale): Beckert 54 & 59 Millan 85 & 61 So Millan is clearly rated as the superior defender. And despite their on-field performances, OOTP gives them ZR scores that coincidentally match their ratings. Without knowing the internal calculation of the ZR and Efficiency stat, I don't know how to investigate this any further. Last edited by uruguru; 01-01-2025 at 10:19 PM. |
|
|
|
|
|
|
#11 | |
|
All Star Starter
Join Date: May 2022
Posts: 1,268
|
Quote:
But pitchers with good BABIPs force more balls to be converted to outs. That's an observable fact. If I had to guess at the implementation, I think that would mean that the difficulty of batted balls against them become easier (thus more become outs). So fielders behind pitchers with good BABIPs would have fewer difficult balls to catch, and that makes the defensive ratings of players on those teams less important. That was sort of the point I was trying to make. |
|
|
|
|
|
|
#12 | |
|
All Star Starter
Join Date: May 2022
Posts: 1,268
|
Quote:
Not going to engage with you. Have a nice day. Last edited by uruguru; 01-01-2025 at 10:33 PM. |
|
|
|
|
|
|
#13 |
|
Hall Of Famer
Join Date: Jun 2004
Posts: 4,256
|
This comes down to the fact that you are not understanding that Range (PO + A)/9 innings is a completely antiquated and meaningless statistic. It should actually be more accurately titled Range Factor in the game though.
Your answers as to why Felix Millan is going to win the Gold Glove is right there on that page at the bottom right. Millan has a Defensive Efficiency of 1.099, which means he turns 9.9% more balls batted to him into an out than the average 2Bmen that season. Beckert has a Defensive Efficiency of 0.982, which means he is below average at turning batted balls to him into an out. Beckert is 1.8% below league average. Millan made 360 Assists, and 360/1.099 = 328. 360 - 328 = +32 outs made. This makes sense considering he has a +18 ZR for the season. Beckert made 459 Assists, and 459/0.982 = 467. 459 - 467 = -8 outs made. I expect Beckert will be about -4 or -5 ZR for the season. You said he was -9.8 ZR, and that may just be due to the base-out situations of the balls he fielded. Missing a groundball with nobody on base and 2 outs is going to be less detrimental than with the bases loaded. The difference between Millan and Beckert is 40 defensive plays (+32 for Millan and -8 for Beckert). The difference between their ZR is 27.8 (+18 for Millan and -9.8 for Beckert). This makes sense because 27.8 runs over 40 plays = 27.8/40 = 0.695, and infield defensive plays are generally worth about 0.70 to 0.75 defensive runs per play. There is no coincidence about it when Millan has 85 range rating and Beckert has 54 range rating. You are getting caught up on the raw totals of Assists and PO. These are not meaningful. PO for 2B are going to go way up if a team allow more baserunners because there are more force plays at 2B. They will also go up with more flyballs to the infield. They will also go up with more balls in play in general. None of this has anything to do with pBABIP ratings. Basically, you should completely ignore RNG because it is just an old stat that has always been reported in baseball fielding statistics and has never meant anything because the balls in play are very different from team to team, the balls in play to each position will vary from team to team, the GO:AO rate will vary from team to team, and so forth. What is important is how often a fielder turns a ball in their zone into an out and that is reported as their EFF. For outfielders, the most important rating is their Range. Outfielders do not make a substantial number of errors or assists, but they do make hundreds of PO. For a 2B you will want to look at their combination of Range and Error. For SS Range/Arm/Error all need to be very good. For 3B Arm is going to be more important than Range. Last edited by Garlon; 01-01-2025 at 11:45 PM. |
|
|
|
|
|
#14 |
|
Hall Of Famer
Join Date: Nov 2005
Posts: 3,120
Infractions: 0/1 (1)
|
|
|
|
|
|
|
#15 |
|
OOTP Developer
Join Date: Jun 2009
Location: Here and there
Posts: 15,843
|
As discussed, I think this is probably more related to what metrics you use to evaluate. Our ZR is based on more modern fielding metrics - generally speaking, it will look at the various difficulty buckets and see what plays players are making. Converting a difficult ball to an out is much more valuable than making 10 or whatever routine plays.
How exactly that mixes in with PBABIP and the range of difficulty of balls in play, that's a very complicated topic. You might have pitchers with a better BABIP cause softer contact, and more easy to field balls. But as demonstrated with the ZR above, if you have a below average defender, they'll still liable to flub those balls. Also, many of the various awards and evaluation are not based solely on FIP-WAR. AI evaluation certainly will factor in pitcher babip ability, and lots of the all-star or awards voting can give credit to RA9-WAR. It may not be perfect, sure. But we're happy to continue to make things better. |
|
|
|
|
|
#16 |
|
All Star Starter
Join Date: Feb 2021
Posts: 1,445
|
Honing in on this part since my understanding of defensive metrics in OOTP has always been a bit shakey and I'm hoping to understand better. This is essentially saying Milan had 32 OAA this season, correct? Isn't that pretty outrageously high? A quick look at Fangraphs shows it would be the second-best OAA in a season in the past 10 years (as long as it's been tracked).
|
|
|
|
|
|
#17 | |||
|
Hall Of Famer
Join Date: Apr 2002
Location: Iowa
Posts: 6,698
|
Quote:
Quote:
__________________
Quoted from another sports gaming forum.. Quote:
|
|||
|
|
|
|
|
#18 | ||
|
Hall Of Famer
Join Date: Apr 2002
Location: Iowa
Posts: 6,698
|
Quote:
__________________
Quoted from another sports gaming forum.. Quote:
|
||
|
|
|
|
|
#19 | |||
|
All Star Starter
Join Date: May 2022
Posts: 1,268
|
Quote:
Quote:
So here's the issue (imo). If defensive awards are based more upon advanced fielding metrics (which btw makes perfect sense in the Statcast era where Pitcher BABIP is not a thing), then you can easily see how Pitcher BABIP can weaken the relevance of fielding ratings and the advanced metrics. For example, let's take a typical fielder gets 70% "easy" balls and 30% "difficult" balls to field. That means that on every batted ball, there's a 30% chance for good fielders to provide value over mediocre fielders (these are obviously made-up numbers just for the sake of the point). 30% feels very significant. However, if you have a historical staff with really high BABIP rating, then that 30% number is going to drop for your team. Suddenly, your mediocre fielder might be successfully converting MORE batted balls into outs than a good fielder on a team with a poor BABIP staff. In fact, if your staff is really reducing the number of difficult balls in play, then it becomes harder to justify choosing defense when you are trying to decide between defense and offense for a position player. If you can concede that, then that's really the crux of the point. At that point, by all available metrics (<<key phrase), the mediocre fielder had a better defensive season than the good fielder. Quote:
If that seems unfair, then think about batters.They don't get Silver Slugger awards based upon how hard they hit the ball. They get them based on how their batted balls turned out, even against crappy defenses. In the end, a swinging bunt single counts more for the Silver Slugger than a screaming line drive out at the fence -- because it's still a hit and the line drive is still an out. Last edited by uruguru; 01-02-2025 at 12:14 PM. |
|||
|
|
|
|
|
#20 | |
|
All Star Starter
Join Date: Feb 2021
Posts: 1,445
|
Quote:
And I would certainly hope errors have less than nothing to do with GG awards, given that they don't provide any meaningful information about the quality of defense. Last edited by MathBandit; 01-02-2025 at 12:30 PM. |
|
|
|
|
![]() |
| Bookmarks |
|
|