regression on pitchers - ratings meaningless? - Page 4

ctorg · 05-11-2005, 10:37 AM

Quote:

Originally Posted by Dan Theman

Didn't Markus once describe a moderate bonus provided by a high score in any of the pitcher skills? I seem to recall that it was back when he realized that DIPS didn't quite cut it on its own, and he decided to implement a slight change with one of his patches.

Yes he did.

You know what would be interesting to see? The difference in regressions done for different eras.

Basically, you run them for a deadball-era league, a league from the 1950s, and a modern league. You could use "reclculate for historical accuracy" or something to get league totals, and see what happens.

Operation Shutdown · 05-11-2005, 02:08 PM

Elendil

Just curious as to what ratings you used for batters. It might be interesting to see if you set all batters talents to something around 50 or 60 (just something that's all the same). I think that may make a more controlled environment.

Elendil · 05-11-2005, 02:46 PM

Quote:

Originally Posted by Operation Shutdown

Elendil

Just curious as to what ratings you used for batters. It might be interesting to see if you set all batters talents to something around 50 or 60 (just something that's all the same). I think that may make a more controlled environment.

Yeah, I'm not sure what I think of that. I recall a study done of defense on this forum, and it was criticized b/c all teams' hitters and pitchers were made exactly the same, and the study found a huge effect of SS/2B/CF defensive range on team wins (about 3X what sabermetricians think the effect of defensive range is). The critics said that making all the hitters and pitchers the same made wins extremely dependent on fielding, compared to "real life." I'm not sure whether that same problem would affect this study. Ideally, of course, you'd like to have all pitchers face the same opponents, so that you control for strength of opponent. Since that variation in strength of opponent is probably random across pitchers, though, I don't think it'll affect the results apart from increasing standard errors slightly.

Elendil · 05-11-2005, 03:27 PM

Just to satisfy my own curiosity, I altered the league stats to make the league totals similar to those for the 1986 AL (a DH is used for both leagues), and then simulated the season and did all these analyses again. As I expected, movement was now important and statistically significant. However, the results on none of the variables were extremely strong. The basic fact is that ERA is a good, but not great, measure of pitcher effectiveness. There's a lot of noise in ERA coming from the way it's counted (e.g., if an error was made in an inning, anything you give up with two outs doesn't count as ER's) and from luck (deviations in BABIP, streaky hitting by opponent, unusually good baserunning by opponent). But in general, stuff, control, and movement all behave as we expect. Which rating is most important depends on what kind of league you have. If homers are abnormally low, then movement doesn't mean much. If strikeouts are abnormally high, then stuff doesn't mean much. If walks are abnormally low, then control doesn't mean much.

crazyhorsejohnny · 05-11-2005, 04:24 PM

Too much time on your hands...

Bobble · 05-11-2005, 05:08 PM

Quote:

Originally Posted by crazyhorsejohnny

Too much time on your hands...

Thank you, Johnny, for using your precious time to come in here and be a jackhole.

Crapshoot · 05-11-2005, 05:09 PM

Quote:

Originally Posted by crazyhorsejohnny

Too much time on your hands...

Too little intellect in yours....

Dagrims · 05-11-2005, 05:18 PM

I remember the last time he offered something productive. Wait, no I don't.

Elindil, a league that I was a part of for an inaugural season (OTBA) used the same creation modifiers that you used to generate its pool of players. The league totals were also very low in walks and had less homers than expected. Has this shown up consistently using Skydog's modifiers?

obaslg · 05-11-2005, 05:20 PM

Quote:

Originally Posted by beorn

One strong possibility is that "stuff" produces a special effect at a certain point. I believe that somewhere around 75 or 80, the pitcher gets a special decrease in % of balls in play that go for hits.

Thus, every point of increase in stuff decreases the chance of a hit, by making a strikeout more likely. But at a certain point, stuff has a second effect.

That definitely fits what I've seen, and it drives me crazy. That's a terrible way to make the game, IMO. If the jump from 50 to a 51 is different than from 99 to 100, why bother with rating numbers?

Crapshoot · 05-11-2005, 05:21 PM

Quote:

Originally Posted by Elendil

Yeah, I'm not sure what I think of that. I recall a study done of defense on this forum, and it was criticized b/c all teams' hitters and pitchers were made exactly the same, and the study found a huge effect of SS/2B/CF defensive range on team wins (about 3X what sabermetricians think the effect of defensive range is). The critics said that making all the hitters and pitchers the same made wins extremely dependent on fielding, compared to "real life." I'm not sure whether that same problem would affect this study. Ideally, of course, you'd like to have all pitchers face the same opponents, so that you control for strength of opponent. Since that variation in strength of opponent is probably random across pitchers, though, I don't think it'll affect the results apart from increasing standard errors slightly.

are you talking about Moyer's old OOTP 4 Defense study ? I think what it showed were some amazing flaws in the engine, where a top end SS was worth about 10 wins more than a bottom barrel one.

obaslg · 05-11-2005, 05:26 PM

Quote:

Originally Posted by Aadik

are you talking about Moyer's old OOTP 4 Defense study ? I think what it showed were some amazing flaws in the engine, where a top end SS was worth about 10 wins more than a bottom barrel one.

I don't know if it was a flaw in the engine, but defense did used to be enormously important. That study was problematic because it used wins, but I did one at the time using runs allowed, and the difference was huge.

Elendil · 05-11-2005, 07:20 PM

Quote:

Originally Posted by Dagrims

Elindil, a league that I was a part of for an inaugural season (OTBA) used the same creation modifiers that you used to generate its pool of players. The league totals were also very low in walks and had less homers than expected. Has this shown up consistently using Skydog's modifiers?

You know, his modifiers might be sensitive to things like league size, esp. if OOTP doesn't automatically expand the talent pool for the fantasy draft the way it should for big leagues. I've used those modifiers for a few leagues of my own, and never noticed much untoward, but then I've always maintained a close watch over the engine stats as well, and tweaked them frequently. Also, all my leagues have been pretty small (16-24 teams). I'm sure Skydog has done a lot more testing, though, so he's more qualified to respond here.

TonyJ · 05-11-2005, 07:23 PM

Quote:

Originally Posted by crazyhorsejohnny

Too much time on your hands...

Clutter

Joshv02 · 05-11-2005, 07:32 PM

Quote:

Originally Posted by beorn

One strong possibility is that "stuff" produces a special effect at a certain point. I believe that somewhere around 75 or 80, the pitcher gets a special decrease in % of balls in play that go for hits.

Thus, every point of increase in stuff decreases the chance of a hit, by making a strikeout more likely. But at a certain point, stuff has a second effect.

Yes, I think this is exactly what Markus described here:

Quote:

So, I adjusted the engine slightly: I gave pitchers with a high stuff rating a small advantage in BABIP, while ones with low stuff ratings got a small disadvatage.

Elendil · 05-11-2005, 07:37 PM

Quote:

Originally Posted by Aadik

are you talking about Moyer's old OOTP 4 Defense study ? I think what it showed were some amazing flaws in the engine, where a top end SS was worth about 10 wins more than a bottom barrel one.

Actually, this is one I was thinking of:
http://www.ootpdevelopments.com/boar...=defense+study

In looking at it again, it examines fielding pct. as well, but I think the criticisms of the methodology are sound.

BPS · 05-12-2005, 02:35 AM

My two cents:

A quick look at your regressions suggest a likely problem with multicollinearity. Your regressions have lots of interactive variables (movement * control, control * stuff) right? If so, then many of these independent variables might be highly correlated with each other. If this is the case, you can't trust the coefficients of your regression.

You can get a reasonably good R2 in such regressions but the coefficient estimates and t-stats can be wacko.

Possible solutions: (1) check for correlation among independent variables and, maybe, if 2 variables are highly correlated only include one of them in the regression: first try one and then the other. Include the one that gives you the best R2 (or best "theoretical justification"). (2) start with just a few basic variables and then add one-by-one other variables and only if these new variables bump up your adjusted R2 include them otherwise into the trashheap they go.

You might also look for heterocedasticity by looking at your residuals. This might help you better identify where possible non-linearities/regression misspecifications exist.

Elendil · 05-12-2005, 10:06 AM

I think you're right about multicollinearity. That's the reason I also tried running the analyses with the independent variables pared down to just stuff, movement, control, & velocity. In those analyses I got stronger results on the individual coefficients, but significance levels were still in the 90-99% range, nothing above that. I think those are reasonable findings, both because ERA has noise in it and because ratings (rightly) don't completely determine performance over the course of a single season, even one with 190 games. Sometimes players will perform above or below their ratings.

Dr. C-Mac · 05-12-2005, 11:30 AM

Thanks Elendil and others for your statistical analyses. I've been wanting to do the same thing for months now, but was waiting until my semester was over (I'm a college professor). I agree with the point that was made yesterday about multicollinearity, particularly given the use of both the ratings variables and squared ratings variables simultaneously in a regression model. There will obviously be a high degree of multicollinearity between those paired ratings variables, thus potentially confounding your results. I would suggest using only one or the other of each of the paired ratings variables (i.e. use either stuff or squared stuff, whichever correlates most strongly with ERA, in the model, but not both) to help in this regard. The other thing I would recommend in looking at your output is to increase the sample size considerably. The independent variable-to-sample size ratio being used in some of these models is far below most accepted standards. One way to alleviate this problem would be to run maybe ten simulated leagues and use all of the data together rather than just a single season of a single league. In this regard, I would recommend running single seasons of multiple separate leagues rather than running multiple seasons of the same league to avoid violating regression's independence of obversation assumption. Further, this might help eliminate an league-by-league anomolies that occur, such as your previous post that one league you ran seemed to have a tendency towards being a pitcher's league.

Thanks for your research into this. Of all the posts/threads we see on the message boards, this is one that I feel may contribute towards making a better game product in the future.

05-11-2005, 02:08 PM	#62
Operation Shutdown All Star Reserve Join Date: Jun 2002 Location: Pittsburgh PA Posts: 912	Elendil Just curious as to what ratings you used for batters. It might be interesting to see if you set all batters talents to something around 50 or 60 (just something that's all the same). I think that may make a more controlled environment. __________________ "And Shepherds we shall be, For thee, my Lord, for thee. Power hath descended forth from Thy hand, Our feet may swiftly carry out Thy commands. So we shall flow a river forth to Thee, And teeming with souls shall it ever be. In Nomeni Patri Et Fili Spiritus Sancti."

05-11-2005, 03:27 PM	#64
Elendil Hall Of Famer Join Date: Dec 2003 Location: the dynasty forum Posts: 2,318	Just to satisfy my own curiosity, I altered the league stats to make the league totals similar to those for the 1986 AL (a DH is used for both leagues), and then simulated the season and did all these analyses again. As I expected, movement was now important and statistically significant. However, the results on none of the variables were extremely strong. The basic fact is that ERA is a good, but not great, measure of pitcher effectiveness. There's a lot of noise in ERA coming from the way it's counted (e.g., if an error was made in an inning, anything you give up with two outs doesn't count as ER's) and from luck (deviations in BABIP, streaky hitting by opponent, unusually good baserunning by opponent). But in general, stuff, control, and movement all behave as we expect. Which rating is most important depends on what kind of league you have. If homers are abnormally low, then movement doesn't mean much. If strikeouts are abnormally high, then stuff doesn't mean much. If walks are abnormally low, then control doesn't mean much. __________________ Heaven is kicking back with a double Talisker and a churchwarden stuffed with latakia.

05-11-2005, 05:18 PM	#68
Dagrims Hall Of Famer Join Date: Jan 2002 Location: Orlando, FL Posts: 3,827	I remember the last time he offered something productive. Wait, no I don't. Elindil, a league that I was a part of for an inaugural season (OTBA) used the same creation modifiers that you used to generate its pool of players. The league totals were also very low in walks and had less homers than expected. Has this shown up consistently using Skydog's modifiers? __________________ "Read books, get brain." Last edited by Dagrims; 05-11-2005 at 05:19 PM.

05-12-2005, 10:06 AM	#77
Elendil Hall Of Famer Join Date: Dec 2003 Location: the dynasty forum Posts: 2,318	I think you're right about multicollinearity. That's the reason I also tried running the analyses with the independent variables pared down to just stuff, movement, control, & velocity. In those analyses I got stronger results on the individual coefficients, but significance levels were still in the 90-99% range, nothing above that. I think those are reasonable findings, both because ERA has noise in it and because ratings (rightly) don't completely determine performance over the course of a single season, even one with 190 games. Sometimes players will perform above or below their ratings. __________________ Heaven is kicking back with a double Talisker and a churchwarden stuffed with latakia.

05-11-2005, 04:24 PM	#65
crazyhorsejohnny All Star Reserve Join Date: Feb 2003 Location: Ottawa Posts: 818	Too much time on your hands...

05-12-2005, 02:35 AM	#76
BPS All Star Reserve Join Date: May 2004 Posts: 721	My two cents: A quick look at your regressions suggest a likely problem with multicollinearity. Your regressions have lots of interactive variables (movement * control, control * stuff) right? If so, then many of these independent variables might be highly correlated with each other. If this is the case, you can't trust the coefficients of your regression. You can get a reasonably good R2 in such regressions but the coefficient estimates and t-stats can be wacko. Possible solutions: (1) check for correlation among independent variables and, maybe, if 2 variables are highly correlated only include one of them in the regression: first try one and then the other. Include the one that gives you the best R2 (or best "theoretical justification"). (2) start with just a few basic variables and then add one-by-one other variables and only if these new variables bump up your adjusted R2 include them otherwise into the trashheap they go. You might also look for heterocedasticity by looking at your residuals. This might help you better identify where possible non-linearities/regression misspecifications exist.

05-12-2005, 11:30 AM	#78
Dr. C-Mac Bat Boy Join Date: Oct 2004 Location: Illinois Posts: 8	Thanks Elendil and others for your statistical analyses. I've been wanting to do the same thing for months now, but was waiting until my semester was over (I'm a college professor). I agree with the point that was made yesterday about multicollinearity, particularly given the use of both the ratings variables and squared ratings variables simultaneously in a regression model. There will obviously be a high degree of multicollinearity between those paired ratings variables, thus potentially confounding your results. I would suggest using only one or the other of each of the paired ratings variables (i.e. use either stuff or squared stuff, whichever correlates most strongly with ERA, in the model, but not both) to help in this regard. The other thing I would recommend in looking at your output is to increase the sample size considerably. The independent variable-to-sample size ratio being used in some of these models is far below most accepted standards. One way to alleviate this problem would be to run maybe ten simulated leagues and use all of the data together rather than just a single season of a single league. In this regard, I would recommend running single seasons of multiple separate leagues rather than running multiple seasons of the same league to avoid violating regression's independence of obversation assumption. Further, this might help eliminate an league-by-league anomolies that occur, such as your previous post that one league you ran seemed to have a tendency towards being a pitcher's league. Thanks for your research into this. Of all the posts/threads we see on the message boards, this is one that I feel may contribute towards making a better game product in the future.