DIPS, Defense, and 1974: A Case Study

swampdragon · 06-10-2009, 09:39 PM

Taking this outside as suggested by Steve P. I think the issues pstrickert and others are discussing are important. Let me theorize for a moment and see whether there's general agreement.

1. OOTP is very good at modeling the attributes it is designed to model, provided the correct totals modifiers are being used. Garlon has convinced me of that multiple times.
2. Nonetheless, OOTP is not very good at recreating the dynamics of a pennant race replay. If accuracy means the results will be close to real life, it's not terribly accurate, at least as far as runs allowed are concerned.
3. The key stat driving that differential is hits allowed, both for individual pitchers and for teams as a whole. DIPs takes that stat away from the indiviual pitchers.
4. The available stats in Lahman are limited as far as defense is concerned.
5. Unless we're going to pretend that the original results were mostly luck and that there's no problem, the task for historical OOTP has to be improving the importing or modifying of defensive ability. Everyone with me so far?

old timer · 06-11-2009, 07:10 AM

Quote:

Originally Posted by swampdragon

4. The available stats in Lahman are limited as far as defense is concerned.
5. Unless we're going to pretend that the original results were mostly luck and that there's no problem, the task for historical OOTP has to be improving the importing or modifying of defensive ability. Everyone with me so far?

I just did some tests where I tweaked the A's and Dodgers defensive ratings for a few of their players that I thought were off and both teams BABIP and overall performances were similar to real life. This happened in multiple tests. Without these adjustments, both teams BABIP were much higher than in real life and both generally performed far worse too.

In other words, defensive ratings do seem to be a big problem, but how to fix? Does anyone aside from Markus even know how the ratings are calculated upon importing?

If we had direct access to the game database, we could make a utility that could modify the ratings based on our own algorithm. I don't think he's going to give us such access, however. So short of asking Markus to improve his algorithm and hoping he looks at it, is there anything practical that we can do that would help him improve the defensive ratings?

swampdragon · 06-11-2009, 08:48 AM

Quote:

Originally Posted by old timer

I just did some tests where I tweaked the A's and Dodgers defensive ratings for a few of their players that I thought were off and both teams BABIP and overall performances were similar to real life. This happened in multiple tests. Without these adjustments, both teams BABIP were much higher than in real life and both generally performed far worse too.

In other words, defensive ratings do seem to be a big problem, but how to fix? Does anyone aside from Markus even know how the ratings are calculated upon importing?

If we had direct access to the game database, we could make a utility that could modify the ratings based on our own algorithm. I don't think he's going to give us such access, however. So short of asking Markus to improve his algorithm and hoping he looks at it, is there anything practical that we can do that would help him improve the defensive ratings?

First things first. My suggestion would be to see if an accurate "season disk" can be created for 1974 that would touch nothing but defensive ratings, and would do that objectively, using the metrics on baseball reference, or fielding win shares, or other available advanced stats. If it could be done for that season, then the methods could be generally applied to other seasons for which those advanced stats were available.

It's also possible that the way to go, which would be doable within the Lahman database, is to develop a "team defense" metric, within which all of the defensive ratings would be modified by a set percentage according to team runs allowed. Or, if you've read the Bill James book on Win Shares, you'll remember his efforts to break those down between pitching and fielding. He has multiple formulae that he used to do that. Rather than reinventing the wheel, I suspect we could find a place on the net that broke those down for every team in baseball history. We could correlate those with the existing defensive ratings. I'm open to suggestions from interested parties. As I said, the first test would be to see whether we could use objective criteria to get 1974 right.

knockahoma · 06-11-2009, 10:24 AM

Putting on my future cap, I suspect big changes in attitude are coming at BABIP.

TOT/YR is a stat that works poorly in conjunction with BABIP. Bill James, in a 2006, article admitted scratching his head over defensive stats that had strange variance. We "missing" something, he said.

If you examine the TOT/YR, you'll see strange fluctuations that leave only a few inferences regarding the cause of those wide and sudden variances in good fielders:

1. Injury
2. Chance

Or the 3rd inference, which challenges current BABIP philosophy-- the pitchers are exerting much more influence than currently believed on balls in play.

knockahoma · 06-11-2009, 10:27 AM

Quote:

First things first. My suggestion would be to see if an accurate "season disk" can be created for 1974 that would touch nothing but defensive ratings, and would do that objectively, using the metrics on baseball reference, or fielding win shares, or other available advanced stats.

I'd love to see a "season disk". In fact, I re-edit 74 all the time for that. I'd suggest using the views of actual scouts and coaches as part of the equation, too.

STRAT-O-MATIC has had an excellent rep with professional baseball players over the decades. They dig into stats, but temper that with scouts, or tv commentators on their pay-roll.

I think that's important. Bill James writes about the shadow of the monster, how much is missing from the fielding math that we currently have. He says what's missing is important.

In other words, Math without Eyes may be as bad as Eyes without Math.

swampdragon · 06-11-2009, 11:58 AM

Quote:

Originally Posted by knockahoma

I'd love to see a "season disk". In fact, I re-edit 74 all the time for that. I'd suggest using the views of actual scouts and coaches as part of the equation, too.

STRAT-O-MATIC has had an excellent rep with professional baseball players over the decades. They dig into stats, but temper that with scouts, or tv commentators on their pay-roll.

I think that's important. Bill James writes about the shadow of the monster, how much is missing from the fielding math that we currently have. He says what's missing is important.

In other words, Math without Eyes may be as bad as Eyes without Math.

We don't have eyes for many of these seasons, and a season disk for 1974 would only be valuable beyond that season if it was based on math so that it could have a wider application. Markus is committed to DIPs, and it seems to have majority support in this community. If we're going to improve the OOTP experience for historical play, it's going to have to be within that framework. That's a high-level decision above our pay grade.

Individualized season quickstarts might be fun, and they'd probably be easier to do than what I have in mind, but you'd lose the career play. Still, I can see the advantages in the approach. Do you have a quickstart that works for 1974?

pstrickert · 06-11-2009, 12:07 PM

Markus said (as of today) that he'll work on the fielding problem for Patch #2. It would definitely help if we had some specific, detailed recommendations for him.

swampdragon · 06-11-2009, 01:35 PM

Quote:

Originally Posted by pstrickert

Markus said (as of today) that he'll work on the fielding problem for Patch #2. It would definitely help if we had some specific, detailed recommendations for him.

The easiest thing to do (not that it's all that easy) would be a team defense concept that adjusted (or recalced) the individual defenders by whatever percentage was necessary to get a team's BABIP to what it should be. Obviously you'd have to adjust for park, and possibly some other things as well. But you wouldn't need a new database.

magnet · 06-11-2009, 01:45 PM

Quote:

Originally Posted by swampdragon

The easiest thing to do (not that it's all that easy) would be a team defense concept that adjusted (or recalced) the individual defenders by whatever percentage was necessary to get a team's BABIP to what it should be. Obviously you'd have to adjust for park, and possibly some other things as well. But you wouldn't need a new database.

I guess my followup to this would be; Will this system only work on leagues that import players to their real-life teams? If the team is only half of who was really there in 1974, wouldn't the real-life team BABIP be essentially useless?

RonCo · 06-11-2009, 02:12 PM

To support historicals the way many want to play them, OOTP really needs a season-by-season set of rosters that sets defensive ratings. Someone could do that in their "spare time" by loading up the game, then doing a roster export/import (I think defense can be adjusted that way, anyway ... it works in v9, so I assume it works in X). Then save the game as a quickstart asnd post that.

It's a lot of effort, but could be worth it to the community if a few folks were to undertake it.

swampdragon · 06-11-2009, 03:07 PM

Quote:

Originally Posted by magnet

I guess my followup to this would be; Will this system only work on leagues that import players to their real-life teams? If the team is only half of who was really there in 1974, wouldn't the real-life team BABIP be essentially useless?

That would be correct.

swampdragon · 06-11-2009, 03:36 PM

Quote:

Originally Posted by swampdragon

That would be correct.

Which means that the easiest way to do this probably doesn't work for the majority of players. Which gets us back to needing better defensive imports and the limitations of working within the Lahman database. I'm getting discouraged.

magnet · 06-11-2009, 03:56 PM

Quote:

Originally Posted by swampdragon

Which means that the easiest way to do this probably doesn't work for the majority of players. Which gets us back to needing better defensive imports and the limitations of working within the Lahman database. I'm getting discouraged.

I hope I didn't discourage anything; if this project works it would be a great addition, and make a lot of player's experience that much more enjoyable.

thehef · 06-11-2009, 04:15 PM

Hey Oldtimer, I'm curious as to which '74 Dodgers players you tweaked. I'm guessing Russell and Garvey were two... Also, did you find that you needed to do anything to the bullpen since Marshall was used far more often (and for more innings) than any AI would be likely to use him?

StyxNCa · 06-11-2009, 06:13 PM

Quote:

Originally Posted by RonCo

To support historicals the way many want to play them, OOTP really needs a season-by-season set of rosters that sets defensive ratings. Someone could do that in their "spare time" by loading up the game, then doing a roster export/import (I think defense can be adjusted that way, anyway ... it works in v9, so I assume it works in X). Then save the game as a quickstart asnd post that.

It's a lot of effort, but could be worth it to the community if a few folks were to undertake it.

The question is how to adjust them. I have never seen, though I have asked, for some kind of thing telling me how much adjustment is needed for such and such a result, especially for errors. I wouldn't mind adjusting my league if I had some kind of guideline to use.

old timer · 06-11-2009, 06:24 PM

It just occurred to me that a program could be made to experiment with defensive ratings. If you export the team rosters as a text file, the program could then read the Lahman database that comes with the game, come up with the new ratings and then modify the roster file for reimportation. That would remove the tedium of hand inputting the ratings.

Of course, the hard part would then be coming up with an algorithm for using the Lahman stats to come up with ratings that are consistently superior to what the game comes up with.

old timer · 06-11-2009, 06:28 PM

I could write such a program, but probably not the algorithm for deriving the superior ratings. In other words, I could do the easy part.

If someone without programming skills can figure out the hard part, I could program it. Of course, if someone can program and has ideas on how to better rate the players, that would be even better.

old timer · 06-11-2009, 10:41 PM

I'm still testing things out in game to get a feel for how changes to defensive ratings can impact a team and I thought I'd share the results.

I made changes (upward) to Campaneris' range and arm ratings and to Green's as well. In RL, Green didn't even play full time that year, but the AI uses him as the starter all year. Even changing just Green's or Campaneris' defensive ratings, but not both, made a noticeable improvement in the A's pitching outcomes.

Those were the only changes made in the whole league. The A's team pitching stats more closely resemble RL and the team consistently wins the division (all 6 times - small sample size, I know). Except for usage, the individual pitchers (on the A's) also performed closer to RL. However, I'm not suggesting these are the changes needed to make the A's play more like RL. I was merely interested in how "little" changes could affect certain outcomes.

I wonder how much better the game can do with rating players. Short of people doing the ratings manually (like roster sets in other games), does anyone believe the in-game ratings can be made much better?

swampdragon · 06-12-2009, 12:26 AM

Quote:

Originally Posted by old timer

I'm still testing things out in game to get a feel for how changes to defensive ratings can impact a team and I thought I'd share the results.

I made changes (upward) to Campaneris' range and arm ratings and to Green's as well. I noticed that Holtzman's HA were always better than RL and Hunter's were worse, so I switched their GB% (where do those numbers come from anyway?). Holtzman went from 51 to 47 and Hunter from 47 to 51. I figured since I improved the middle infield defensive ratings, this would lower Hunter's HA and raise Holtzman's HA and that's what has happened so far in 6 tests.

In RL, Green didn't even play full time that year, but the AI uses him as the starter all year. Even changing Just Green's or Campaneris' defensive ratings, but not both, made a noticeable improvement in the A's pitching outcomes. Also, those small changes in GB% made more of an impact than I expected, but maybe it wouldn't have if I had run more tests. So MANY variables, each of which can have a significant impact on the outcome.

Those were the only 4 changes made in the whole league. The A's team pitching stats more closely resemble RL and the team consistently wins the division (all 6 times - small sample size, I know). Except for usage, the individual pitchers (on the A's) also performed closer to RL. However, I'm not suggesting these are the changes needed to make the A's play more like RL. I was merely interested in how "little" changes could affect certain outcomes.

I wonder how much better the game can do with rating players. Short of people doing the ratings manually (like roster sets in other games), does anyone believe the in-game ratings can be made much better?

Your experience does suggest that if they could be made better, that the game would indeed come closer to replicating real life. So we would be tinkering with a real change rather with a cosmetic one. It also suggests that relatively minor changes in a position or two might do the job. Now, if we only knew which ones...

I think we should give the upcoming patch a try, since Markus has said he will try to improve defensive imports. Then we can analyze the imports vs. what we know about the players.

old timer · 06-12-2009, 02:58 AM

Just wanted to note that I edited my post above regarding GB%. The GB% didn't seem to matter at all (at least I couldn't see any effect) after running many more tests. I should know better than to post results from so few tests.

Nevertheless, the two defensive changes did make a big difference and I'm hoping that a few such adjustments on each team (if necessary) will improve the replay. I'm going to see how much improvement can be made in the '74 replay without touching hitting and pitching ratings, but will hopefully resist posting results before sufficient testing has been done.

06-10-2009, 09:39 PM	#1
swampdragon Hall Of Famer Join Date: May 2002 Location: The Lonely Mountain Posts: 2,509	DIPS, Defense, and 1974: A Case Study Taking this outside as suggested by Steve P. I think the issues pstrickert and others are discussing are important. Let me theorize for a moment and see whether there's general agreement. 1. OOTP is very good at modeling the attributes it is designed to model, provided the correct totals modifiers are being used. Garlon has convinced me of that multiple times. 2. Nonetheless, OOTP is not very good at recreating the dynamics of a pennant race replay. If accuracy means the results will be close to real life, it's not terribly accurate, at least as far as runs allowed are concerned. 3. The key stat driving that differential is hits allowed, both for individual pitchers and for teams as a whole. DIPs takes that stat away from the indiviual pitchers. 4. The available stats in Lahman are limited as far as defense is concerned. 5. Unless we're going to pretend that the original results were mostly luck and that there's no problem, the task for historical OOTP has to be improving the importing or modifying of defensive ability. Everyone with me so far? __________________ “Of all tyrannies, a tyranny sincerely exercised for the good of its victims may be the most oppressive. It would be better to live under robber barons than under omnipotent moral busybodies." -- C.S. Lewis

06-11-2009, 10:24 AM	#4
knockahoma All Star Reserve Join Date: Dec 2002 Posts: 792	Putting on my future cap, I suspect big changes in attitude are coming at BABIP. TOT/YR is a stat that works poorly in conjunction with BABIP. Bill James, in a 2006, article admitted scratching his head over defensive stats that had strange variance. We "missing" something, he said. If you examine the TOT/YR, you'll see strange fluctuations that leave only a few inferences regarding the cause of those wide and sudden variances in good fielders: 1. Injury 2. Chance Or the 3rd inference, which challenges current BABIP philosophy-- the pitchers are exerting much more influence than currently believed on balls in play. Last edited by knockahoma; 06-11-2009 at 10:39 AM.

06-11-2009, 10:41 PM	#18
old timer Hall Of Famer Join Date: May 2002 Posts: 2,278	I'm still testing things out in game to get a feel for how changes to defensive ratings can impact a team and I thought I'd share the results. I made changes (upward) to Campaneris' range and arm ratings and to Green's as well. In RL, Green didn't even play full time that year, but the AI uses him as the starter all year. Even changing just Green's or Campaneris' defensive ratings, but not both, made a noticeable improvement in the A's pitching outcomes. Those were the only changes made in the whole league. The A's team pitching stats more closely resemble RL and the team consistently wins the division (all 6 times - small sample size, I know). Except for usage, the individual pitchers (on the A's) also performed closer to RL. However, I'm not suggesting these are the changes needed to make the A's play more like RL. I was merely interested in how "little" changes could affect certain outcomes. I wonder how much better the game can do with rating players. Short of people doing the ratings manually (like roster sets in other games), does anyone believe the in-game ratings can be made much better? Last edited by old timer; 06-12-2009 at 02:49 AM. Reason: GB% data was invalid due to small sample size.

06-11-2009, 12:07 PM	#7
pstrickert Hall Of Famer Join Date: Dec 2005 Posts: 15,726	Markus said (as of today) that he'll work on the fielding problem for Patch #2. It would definitely help if we had some specific, detailed recommendations for him.

06-11-2009, 02:12 PM	#10
RonCo Hall Of Famer Join Date: Aug 2003 Posts: 9,499	To support historicals the way many want to play them, OOTP really needs a season-by-season set of rosters that sets defensive ratings. Someone could do that in their "spare time" by loading up the game, then doing a roster export/import (I think defense can be adjusted that way, anyway ... it works in v9, so I assume it works in X). Then save the game as a quickstart asnd post that. It's a lot of effort, but could be worth it to the community if a few folks were to undertake it.

06-11-2009, 04:15 PM	#14
thehef Hall Of Famer Join Date: Jun 2006 Posts: 4,838	Hey Oldtimer, I'm curious as to which '74 Dodgers players you tweaked. I'm guessing Russell and Garvey were two... Also, did you find that you needed to do anything to the bullpen since Marshall was used far more often (and for more innings) than any AI would be likely to use him?

06-11-2009, 06:24 PM	#16
old timer Hall Of Famer Join Date: May 2002 Posts: 2,278	It just occurred to me that a program could be made to experiment with defensive ratings. If you export the team rosters as a text file, the program could then read the Lahman database that comes with the game, come up with the new ratings and then modify the roster file for reimportation. That would remove the tedium of hand inputting the ratings. Of course, the hard part would then be coming up with an algorithm for using the Lahman stats to come up with ratings that are consistently superior to what the game comes up with.

06-11-2009, 06:28 PM	#17
old timer Hall Of Famer Join Date: May 2002 Posts: 2,278	I could write such a program, but probably not the algorithm for deriving the superior ratings. In other words, I could do the easy part. If someone without programming skills can figure out the hard part, I could program it. Of course, if someone can program and has ideas on how to better rate the players, that would be even better.

06-12-2009, 02:58 AM	#20
old timer Hall Of Famer Join Date: May 2002 Posts: 2,278	Just wanted to note that I edited my post above regarding GB%. The GB% didn't seem to matter at all (at least I couldn't see any effect) after running many more tests. I should know better than to post results from so few tests. Nevertheless, the two defensive changes did make a big difference and I'm hoping that a few such adjustments on each team (if necessary) will improve the replay. I'm going to see how much improvement can be made in the '74 replay without touching hitting and pitching ratings, but will hopefully resist posting results before sufficient testing has been done.