|
||||
|
|
OOTP 14 - General Discussions Discuss the new 2013 version of Out of the Park Baseball here! |
|
Thread Tools |
11-12-2013, 12:38 PM | #1 |
Hall Of Famer
Join Date: May 2011
Posts: 3,029
|
Ratings study
Spoiler alert: Do not read on if you are not interested in figuring out how the ratings work "under the hood."
I've thought for a while about how best to prioritize player ratings. In other words, would I rather have an extra point in power, or in contact, or is an extra point in contact worth three in eye? There are a couple of ways to figure this out- here's one that I followed out to see the results. I pulled up a 2013 MLB quickstart, turned off scouting, and kept the ratings scale as 1-20, which is generally what I use. I created a new player ("Johnny Average") and edited all of his ratings to 95/200. I chose this number because it's in the midrange of what is considered a "10" on the 1-20 scale. (I considered putting him at 100/200 but decided that might complicate things because it's right on the cusp of what is considered 10 vs 11.) Then, I modified each of the hitting attributes independently to see what the projected stats for the player would be. For example, for power, I would change the power rating to 5, 15, 25, 35, etc. (which would give a rating of 1, 2, 3, 4... on the 1-20 rating scale) and checked the results. There were a few things I learned--or, more accurately, already knew but had my understanding solidified in the process. First, the "contact" rating is calculated from the BABIP rating, the Avoid K's rating, and the power rating, so in order to modify this rating, I actually had to modify the BABIP rating first, and then the Avoid K's rating. Second, the Gap and Eye ratings are totally independent of the other ratings- in other words, an increase in Gap will not increase the total number of hits at all, but will just make more of the hits doubles or triples. Power is different, in that an increase in the rating will result in more home runs and somewhat more hits, because there are less balls in play overall, which ends up inflating the Contact rating. And third, if the Contact rating is held constant, Avoid K's does not affect the overall output (OBP or SLG) of a player. The next step was converting the player's output into wOBA and wRAA. I used the 2012 linear weights from Fangraphs to do so. One point I debated was whether to use their coefficient for BB (since it is technically supposed to be only unintentional walks). At first I did the analysis using .9 of this coefficient to reflect that approximately 1/10 walks are intentional, but I then realized that this correction would likely make the end result more inaccurate, because it acted as though players were intentionally walked at a rate commensurate with how often they walked in general, which is definitely not the case. So I ended up just using the listed coefficient without a correction. Anyway, the wOBA formula that I used does not include sac flies or intentional walks in the denominator, which more or less cancel out anyway. One additional catch to the walks analysis is that by increasing the walks you increase the overall plate appearances, so I also put in a column that "normalizes" the output to the 603 plate appearances that occurred in the other columns. From there, I was able to see the relationships between each individual attribute and wOBA and wRAA. I found wRAA to be helpful because it gave a clear sense of how many runs you would gain from an increase of one point in each attribute. In other words, for every point in Contact above average, you would expect to gain about four runs. (Or about five or six singles, if you don't like advanced metrics.) Of some interest, for Contact, Power, and Eye, there's a different relationship above and below the average; for example, every point in Contact BELOW the average will actually cause you to lose about eight runs! In the final column, I smoothed the data out so that I had a more granular sense of how many runs you would gain from each point. Then, I used these numbers to estimate how many runs above average everyone on the 2013 Seattle Mariners would give you. (I've been playing with this team in a dynasty recently.) In case you were wondering, Mike Morse was projected to be the best hitter on the team with 27 wRAA, followed by Kendrys Morales at 23.5, Jesus Montero at 21, and Justin Smoak at 20. (You can see why the Mariners did not exactly meet expectations this year.) There are quite a few things that this leaves out, obviously. Speed (which affects ratio of doubles and triples), baserunning (including steals), defense and positional adjustments, replacement level. For those that don't want to look at the attached spreadsheet, here's my summary: 1. Each point in Contact above 10 adds 4 runs; each point below subtracts 8. Contact is the most important rating by this analysis. 2. Each point in Gap above or below 10 adds or subtracts one run. 3. Each point in Power above 10 adds 3 runs; each point below subtracts 1.5. 4. Each point in Eye above 10 adds 2.5 runs; each point below subtracts 1.5. 5. Avoid K's will not affect your expected run output by this analysis. One interesting point within this is the potential benefit of platooning. On the average, a player will gain about one point per attribute if they have the platoon advantage compared to if they don't. This works out to a little more than a 10-run advantage when a player has the platoon advantage vs not having it, and that is before incorporating the advantage of the pitcher's rating differential. Very interested to hear what others think. I'll also take this approach to the pitcher ratings and see what I find. |
11-12-2013, 05:10 PM | #2 |
Hall Of Famer
Join Date: Apr 2007
Location: Toronto
Posts: 9,162
|
OOTP ratings have either a linear or a piecewise linear relationship with stats output. Most are two-part linear with the break at 100/200. So if you look, say, at the Power rating, as you increase Power by one point for someone below 100/200, his expected HR total will increase by 0.16 per 550 AB. If you increase Power by one point for someone above 100/200, his expected HR total will increase by 0.32 per 550 AB.
So because the slope change always occurs at 100/200, you would want to use that as your baseline in this kind of study. And for batters, output is normalized by AB and not by PA (the output is based on 550 AB). For pitchers, you'll find that Control matters least for pitchers already above average, and matters most for pitchers well below average. Movement matters most for above average pitchers, though Stuff is also important. When you do study pitching, you'll need to reverse engineer the calculation of the Movement rating, since the value used by the game combines the Movement rating in the editor with the pitcher's GB%. And the Stuff rating in the Editor is what would be used for a starting pitcher. A pitcher used in relief will get a Stuff bonus depending on his repertoire. |
11-12-2013, 07:03 PM | #3 |
Hall Of Famer
Join Date: Nov 2002
Posts: 3,584
|
Am I correct in my understanding that you didn't get any results from simming out the games but just from using the projected stats for the players?
__________________
StatsLab- PHP/MySQL based utilities for Online Leagues Baseball Cards - Full list of known templates and documentation on card development. |
11-13-2013, 11:42 AM | #4 |
Hall Of Famer
Join Date: May 2011
Posts: 3,029
|
Yes, this is all just straight from the projections. I've considered some ways to test this and some of the other attributes (speed, defense, etc) though these would be a lot more complicated.
|
11-13-2013, 11:43 AM | #5 | |
Hall Of Famer
Join Date: May 2011
Posts: 3,029
|
Quote:
|
|
11-13-2013, 03:04 PM | #6 |
Hall Of Famer
Join Date: May 2011
Posts: 3,029
|
pitchers
For pitchers, here's what comes out:
-Each point above or below 10 for Stuff is worth four runs. -Each point above or below 10 for Movement is worth 4.5 runs. -Each point above 10 for Control is worth 2.5 runs; however, every point below 10 is worth six runs. This is consistent with injury log's comments. For relievers, I divided their final values by 3 since relievers pitch about 1/3 as many innings... this is consistent with the stats generator in the editor, which uses a baseline of 900 at-bats for starters and 300 for relievers. Injury log's point about the boost to stuff is well taken, though since the slope of the line doesn't change as it goes across 10, it doesn't really matter where you start counting from as long as you are comparing apples to apples- reliever to reliever. For what it's worth, the starter-to-reliever adjustment appears to give you 8/200 more points to Stuff. Looking at the Seattle Mariners, it shows King Felix with about a 2.5-win projected advantage over the second-best starter, Hisashi Iwakuma, with all the remaining starters within a span of about one win. Among the relievers, the top one (Tom Wilhelmsen) is about a win better than the worst (Lucas Luetge). Sounds about right to me. The platoon advantage is about the same for the pitchers, about one point (on the 1-20 scale) per attribute. So when you gain the platoon advantage as a pitcher, it's a ten-run gain if you are above the Control cut-point, and more like fifteen if you are below. Adding up the advantage on offense and defense, you would gain about 20-25 runs total by gaining the platoon advantage. However- there's approximately a 15-run penalty (.034 loss in wOBA) associated with pinch hitting, which I believe is built into OOTP, at least to a certain extent, so you would want to make sure that the pinch hitter you are using is at least around the same baseline ability as the player you are hitting for. I went back and did some linear regression which confirmed the approximate findings that I reported yesterday for the hitters. I also spent some time trying to calculate replacement levels for different positions, both with an MLB quickstart and with a fictionally generated game with 30 teams- though in the end that analysis didn't amount to much. I should emphasize that the calculations that you see on the Mariners' page are not absolute measurements of value in any sense- they are relative measurements of players' batting abilities, starting pitching abilities, and relief pitching abilities. Next step: I think I'm going to create a totally average team, then simulate seasons against an identical team with the exception of one particular attribute. This will be a good way to see if the above conclusions play out in actual seasons, as well as to test the relative value of defense, base running, etc. |
11-13-2013, 04:01 PM | #7 |
Minors (Rookie Ball)
Join Date: Jun 2011
Posts: 29
|
This is fantastic stuff. Exactly the kind of things I've been looking for in this forum!
|
11-13-2013, 05:48 PM | #8 | |
Hall Of Famer
Join Date: Apr 2007
Location: Toronto
Posts: 9,162
|
Quote:
And if you are planning to run empirical tests using 'average' players, be aware that the average ratings values in any default long-running fictional league do not tend to be 50/100. I think you'll find the average ratings for starting pitchers tend to be closer to 60/100 (a bit higher than that for Movement, and a bit lower in Stuff and Control). Batting ratings tend to average out closer to 50/100. And if you plan first to determine reasonable ratings averages as a baseline, I'd suggest simming ahead at least 10 years after creating a fictional league. The distribution of ratings values at league creation seems to me to be a bit of a mess, and is quite different from what you'll find after the game has had a chance to get a few draft classes to the Majors. |
|
11-13-2013, 10:21 PM | #9 | |
Hall Of Famer
Join Date: May 2011
Posts: 3,029
|
Quote:
|
|
11-14-2013, 02:17 PM | #10 | |
Hall Of Famer
Join Date: Apr 2007
Location: Toronto
Posts: 9,162
|
Quote:
|
|
11-14-2013, 04:19 PM | #11 | |
Hall Of Famer
Join Date: May 2011
Posts: 3,029
|
Quote:
I'm not at all expecting that the results from this next stage of experimentation would result in linear results- logarithmic seems much more likely for the additional reason that every hit, walk, or home run leads to another opportunity for the team to bat- so the rate at which "good outcomes" happen increases faster than the rate at which outs made increase. However, it does give a good proxy for which attributes are the most "valuable," all things considered, which is really the question I'm interested in in the first place. |
|
11-14-2013, 05:56 PM | #12 |
Hall Of Famer
Join Date: May 2011
Posts: 3,029
|
Initial results of sim study
So, again, what I've done here is made two teams of 100/200-rated clones- the Average Joes and the Dependent Variables. The only areas where I've had to deviate from that is on defense, where I got players rated as 100/200 overall at their position (which necessitates monkeying with their range and error values until you get it right) and for pitchers, where as injury log pointed out, you have to mess around with the inputs for Stuff and Movement until you get this right. For the Dependent Variables, I then pumped up one attribute to 200/200 for everyone on the team to see how much this would effect their run scoring over 10,000 games.
I actually screwed this up for Movement in the initial study, and was using pitchers rated 108/200 as a default, as well as for all of the data here. The second line of the data shows the results for 100/200 Movement ratings, which results in about 1000 more home runs per team and 2500 more runs. So what does this show... I'll highlight the key results. 1. Contact was the most important offensive variable, boosting the team's winning percentage to 72%. Eye was actually close behind, at 70%, and Power was at 70% as well. 2. While Avoid K's had a small effect, it did have SOME effect, which was not shown in the first round of tests. Both Avoid K's and Gap increased the team winning percentage to around 56%. 3. Defense was in between the three big offensive variables and the two less important ones, at 66%. I actually forgot to boost the pitchers' defense so this may slightly underestimate the results here. 4. Speed, stealing, and base running clocked in around 52% apiece. I would guess there would be some synergy to having more than one of these increased simultaneously, though that could be said about a lot of other attributes too. 5. The three pitching attributes had a similar effect, though most pronounced with Stuff, which improved the winning percentage to 67%. Movement and control were 62% and 59% respectively. I repeated the test with the new 100/200 Movement baseline, which resulted in a 63% winning percentage- close enough to dissuade me from re-running other tests with that baseline. Next up: a check to see the results of changing an individual player's attributes rather than the whole team. |
11-14-2013, 09:02 PM | #13 |
Hall Of Famer
Join Date: Apr 2007
Location: Toronto
Posts: 9,162
|
When your defense is below average, Stuff is going to become more important, because with bad defense behind him, it's harder for a pitcher to get outs any other way than by strikeouts. I don't know if you checked your BABIP total for your league, but 50/100 position ratings in OOTP are well below average (in one of my standard fictional leagues, the average rating for starting LFs is in the mid-80s). If your low defensive ratings have inflated BABIP, that may account for your finding that Stuff is more important than other ratings.
|
11-14-2013, 10:24 PM | #14 | |
OOTP Developments
Join Date: Aug 2007
Location: Nice, Côte d'Azur, France
Posts: 19,757
|
Quote:
In a league where everyone's rated 20/200 but with the same modifiers as the modern MLB quickstart, a guy with all ratings at 50/100 would likely be a .400 hitter or 70 home run guy. I'm making those numbers up to be honest, but the point is that the expected output doesn't simply depend solely on the ratings. It's not the specific individual ratings that are primarily important in determining stats output. Rather the modifiers are all important, followed by the distribution of ratings, whatever scale they may use. The info in the editor is simply a nice guide regarding expected performance in a modern type MLB league but is in no way conclusive or even particularly accurate in many cases. It's just a very, very rough estimate of expected results in one possible environment. The expected statistical output can vary almost infinitely depending on the modifiers and the ratings matrix used in a given league at a given level. Even in a modern MLB environment, if you have too many very high or very low rated players included then the expected statistical output for a given player with the exact same ratings will vary significantly. The projections in the editor don't take any of that context into account. That's not to say your tests aren't good, and that they won't be useful, they will. It's a good idea. You'll just want to be aware of the situation so that you'll know what to expect and design your tests around how OOTP's statistical and ratings model actually works, rather than how the calculator in the editor rather inaccurately makes it appear to work. Last edited by Lukas Berger; 11-15-2013 at 12:13 AM. |
|
11-15-2013, 09:32 AM | #15 |
Minors (Rookie Ball)
Join Date: Jun 2011
Posts: 29
|
I'm very interested in a similar study that test how a change in individual defensive ratings (range, arm, etc) changes performance.
|
11-15-2013, 02:53 PM | #16 |
Hall Of Famer
Join Date: May 2011
Posts: 3,029
|
I didn't do this per se- I'm making an assumption (which is probably wrong) that a 200/100 third baseman is about the same regardless of whether he gets his points from arm, range, error, or double play ratings. I did look at which defensive position was most valuable, which I will show in a bit.
|
11-15-2013, 03:03 PM | #17 | |
Minors (Rookie Ball)
Join Date: Jun 2011
Posts: 29
|
Quote:
|
|
11-15-2013, 03:40 PM | #18 |
Hall Of Famer
Join Date: May 2011
Posts: 3,029
|
Sim study, improving individual players
OK, for this round, I took my 100/200 clones, and then pumped up one stat for one individual players on the Dependent Variables team.
1. Contact and Eye were again the most valuable- in fact, in this test Eye was a bit more valuable but I think that was in part because the Average Joes had an uncharacteristically poor 10,000-game stretch. (Even 60 seasons is not long enough for the luck to even out!) These were worth about 35 runs per 100 games. Power was worth about 30 runs per 100 games. 2. The cumulative effect of increasing Running Speed, Stealing, and Baserunning was about the same as increasing Gap or Avoid K's, about 20 runs. 3. Each of the pitching attributes accounted for about 30 runs per 100 games when a starter was improved. The effect of improving a reliever's Stuff was about 1/2 as valuable as the effect of improving a starter's Stuff. 4. Defense- was fascinating. The outfielder fielding ratings made an enormous difference- in fact, increasing any outfielder's ratings led to about 40 runs per 100 games. Improving infielders led to 30 (SS), 25 (3B), 20 (2B), and 5-10 (1B). As for catcher- the first time I ran this, it led to an increase of 10 runs, which seemed super low- I re-ran it and got about 30 runs the second time, which seems more likely. Of course, the difference between the two sims (again, this is 10,000 games we're talking about!) casts the whole exercise in doubt. So, I think I'm done experimenting for now. I think there were a few things I came away with... 1. Outfield defense is immensely important in this game. In the infield, catcher and shortstop defense are probably the most important. 2. All things considered, a point in Contact is worth more than anything else. 3. Baserunning skills are icing on the cake, but not worth nearly as much as other attributes, even Avoid K's. 4. Improving an attribute in a starting pitcher is roughly as helpful as improving Contact, Power, or Eye. Again, keep in mind, these are all looking at DRAMATIC increases in one attribute- going from 100/200 to 200/200. I don't have the statistical chops or patience to go through increasing each attribute by 10, but if someone wants to follow up, be my guest! |
11-15-2013, 04:15 PM | #19 | |
All Star Starter
Join Date: Dec 2001
Location: near Rochester, NY
Posts: 1,269
|
Quote:
I would suggest that the problem with the defense testing may be this: Those position ratings (the overall rating for shortstop of catcher, for example) are probably not used by the game engine. They are simply an estimate for the manager. Rather, I'm thinking the game engine uses ratings in infield arm, turn doubleplay, range, and so on -- plus experience. I would not assume that a shortstop rating of X based upon weak range and arm but strong double play and error avoidance would necessarily lead to about the same results as a shortstop rating of X based upon the opposite set of subratings. I can offer one bit of anecdotal evidence to support this: In an online league, I was playing a shortstop with a very high overall rating but very low doubleplay rating and simply assumed that he was doing a good job. But with my pitching suffering, late in the season I examined the defensive stats and found that my team was last in the league in double plays -- poison when you have a bunch of groundball pitchers who don't strike out a whole lot of batters. One other thing. I have noticed in gameplay the same thing you find -- that outfield range is extremely important in OOTP. However, I'd suggest it may be highly unequal when you compare the three different outfield positions... Still, I wonder whether, in major league baseball, it is really true that you help your team much more with a great centerfielder than with a great shortstop. Not sure. But, in OOTP, you can see this effect by looking at the zone ratings.
__________________
Commish of Dog Days Baseball Commish Pennant Chase Baseball League (PCBL) Commish and Blue Jays GM Extra Innings Baseball |
|
11-15-2013, 08:13 PM | #20 | ||
Hall Of Famer
Join Date: Apr 2007
Location: Toronto
Posts: 9,162
|
Quote:
Quote:
- the Speed rating controls (in part) how often a player attempts to steal, and in part how many doubles he turns into triples. If you raise Speed without raising the Steal rating, you risk making a player less valuable, because he'll attempt steals even though he's not good at stealing. And raising only the Steal rating while leaving the Speed rating at 50/100 won't have a huge effect, because the player won't attempt many steals. Those ratings work in synergy; it's only by raising both that you can really measure how important they are, since most 'fast' players in any normal league are well above average at both. - your methodology likely makes defense appear much more important than it is in any normal league. 50/100 is not the average position rating in any normal league (it depends on the position, but 70/100 is closer). You have some defenders rated 100/100, and most rated 50/100, so your elite defenders are way further above average than any actual OOTP defender ever could be. |
||
Bookmarks |
Thread Tools | |
|
|