First of all, if you are interested, you can find all the info about Rating Translator in the included documentations. Anyhow, I shall attempt to address your questions below.
Quote:
|
Originally Posted by obaslg
1) For pitchers, from your description it looks like you determine very specific stats (e.g., hits, Ks, BBs) from regression and then calculate predicted ERA from that. I calculate ERA straight from the ratings (from my own regressions) to cut out the middle man. Is there a reason you prefer to do the extra step?
|
My thought is that it's easier and probably more logical to perform regression on the most apparent direct cause and effect relationship (ratings to their corresponding rate stats). This way, an accurate (more or less) elemental rate stats can be known. Plus, ERA is NOT a direct product only resulting from pitcher's ability. To put it simply:
ERA = DIPS ERA + park effect + defense + luck
I consider only the DIPS ERA part as the valid direct product resulting from pitcher's ability. This is why I did NOT provide direct ERA projection in Rating Translator. Instead, various sabermeric ERA projections are used. The included are FIP ERA, DIPS ERA, and ERC. Also, the methods of calculating the above mentioned sabermetric ERAs are coming from various elemental pitching stats. So, I do the ERA projeciton in 'two' steps - corresponding to how those sabermetric ERAs are calculated.
There are also other secondary reasons that I choose to begin with elemental stats. I feel that multiple regression is best to be avoided if possible. The reasons are several folds. For multiple regression, model adequacy and fitness is a paramount issue. And we simply do not know how ERA should be modeled to begin with. You simply can NOT just assume and stick with linear model and not test all other possible models. Coding testing for model adequacy and fitness for multiple regression will complicate the code by a lot and there is no guarantee that it could be coded well by myself. Besides that, the issue of possible over-specification and under-specification must be looked at.
Quote:
|
Originally Posted by obaslg
2) How were your R squareds for the batter and pitcher regressions?
|
I have only done some non-formal, small scaled tests. Well, I can't remeber the exact number right from the top of my head and the numbers are somewhere in my archive. Anyway, the minimal ballpark value of r^2 (percentage of variance explained by regression) for some elemental stats predictions are at least 0.5+. Most has minimal r^2 of 0.6+ and some stats could be higher than that. Bearing in mind that this is done with very limited testing and tested for the whole league (even players with 1 PA or BF count). Another thing to note is that stats themselves are results of random events so that inherent variance value (between acutal stats) would be pretty large to begin with.
Hope this answer your questions a bit.