View Single Post
Old 10-11-2010, 01:33 PM   #4
Bristolduke
All Star Starter
 
Join Date: May 2006
Posts: 1,421
Thanks for the quick response and just to help clarify, I'll break this into a couple of areas.

1) I sent Spritze a PM about data in his HS DB, because it is probably more an issue of I don't understand the premise and is better to just discuss it with him. (He has responded so that will get sorted).

2) the neutralized data itself. There may or may not be anomalies in that data and there are some differences across all the databases. The assumption being that understanding the data will help in understanding how it is used.

3) How the statistical data is utilized? This should be be the same regardless of whether it is real or neutralized data. Just the underlying data is different. This may not be a valid assumption.


Part of my looking for a discussion, is that I would like to develop a historical player guide, which would discuss how to do various trype of historical leagues. There seem to be a lot of questions on how to do certain things and an overall guide would appear to be helpful.

The other part is to improve the game of 12 and beyond. That is why I re-did the master file for the bio data. I also think there is guidance which we could establish as requirements that could be passed on to OOTP developers. One such requirement in my mind is that a player should be rated for any position he played in a particular season/career. Don't over-react here as I do believe there may be some caveats, but as a working premise it is a good starting point. I use this because playing in an on-line historical league, there are usually rules about positions, and it is tough to argue that player X can't play position Y in the league because he has no rating for that position, yet he played that position in real-life, even in that particular year.


So in looking at the responses:

Quote:
Originally Posted by Spritze View Post
The neutralized database process would first look to fill any necessary additional data based on Mr. Jacksons complete career stats. He only played 2 additional games in CF the remaining 3 years of his career and well over 200 games in LF.

If there is no additional career data available for a specific player the process would use replacement player data based on Year/League/Position.
Agreed that neutralization fills in the seasons, but the argument that he plays only 2 additional games in CF is a real-life figure, not a neutralized figure. Also not stated here, but assumed (and was true in the league set up), is that the fielding is using the 3-year value, not the imported season or career options. In fact if you use imported season, then he is rated at both LF and CF in 1901.

The key question, and I believe improvement needed in the OOTP engine, is not should his primary position be LF or CF but should there be ratings for a player at the positions he plays in real lfe.

I'll break this into if siming real life stats and neutralized stats.

Real-Life stats
Jackson doesn't play in 1903, so there are no 1903 season data (e.g. no third year of real-life for the sim to utilize. Question - What does it do? Forgeting that for a moment the 3 years stats to base the ratings are
LF 66 games/528 innings;
CF 60 games/480 innings;
RF 4 games 32 innings.

Just using a blink test, it is hard to understand why Jackson doesn't get rated at CF.

Neutralized stats
Here there is 3 years worth of data and the neutralized process has filled in the gap. The interesting note is that there is no RF neutralized data, but again that is a later discussion.

LF 186 games/1512 innings;
CF 99 games/938 innings.

While the disparity is greater (see discussion on next point), 99 games should be enough to create positional ratings.

Quote:
Originally Posted by Garlon View Post
In my original post I mentioned that we used a method to determine Estimated Defensive Innings for all players where real life data did not exist. The method we used is based on Bill James Win Shares pg 155-160. We took he extra step of creating the discrete estimated defensive inningouts for each of the 3 OF positions for every player.

The estimated data is very solid and it is the best that can be done. When we were creating the DB for the game, we even compared how close results from the estimated method were to actual known data for some seasons. The results were very good for a very large portion of the players in the DB - the method was less accurate for players who only had a handful of games played in a season.
I don't have the Win Shares book, but do have the BJ abstract which defines the Estimated Defensive innings formula. That formula would not create an inverted positional preference. (e.g. Jackson played 59 games in CF and 35 in LF in real life, but in the neutralized data set he plays 75 in LF and still only plays 59 in CF). This inverson affects the 3-year data above, although as I indicated, I would still have expected ratings.

The other interesting note on the neutralized data is that while it left the games played at CF the same at 59, it increased the innings from 472 to 558, which means he has to average 9.5 innings per game played in CF.

Quote:
We took he extra step of creating the discrete estimated defensive inningouts for each of the 3 OF positions for every player.
I don't know if this extra step causes the change or not. But given that for the early years, some data is not available, it is still extrapolated to create the real life stats. I would have assumed that Lahman, or whoever, used Estimated defensive innings to create the real-life data. Is there an issue if that formula is re-applied in the neutralized data set? I also noted that it appears that players are "increased" to a statistical minimum, (250 at bats, 40 games pitched). I did not think that was part of the 752 neutralization, but even if it is that would certainly create the data for players to be rated at positions the played.

Recognizing this is pitching not outfiled, but Bobby Wallace and Jake Beckley both pitched 1 game in 1902 and none in 1901 or 1903. Yet in the 1901 start up, both are rated in neutralized stats and both are rated at pitcher. Yet, if a position player does not get a minimum (looks like 5 or 6 but more analysis needed), they do not get neutralized data generated and of course are not rated at those positions.

Again just questions for understanding and looking to improve upon the historical capabilities.
__________________
Commish of the Home Nations Baseball Association
Commish of the Baseball Association League
Commish of the League of WAR
Commish of the On-Line Dynasty League
SIMBL2 - Westbury Cannons
Great Lakes Baseball - Toledo Neptunes
World Baseball - Guantanamo Marines
OMLB - Cincinnati Reds

Last edited by Bristolduke; 10-11-2010 at 02:54 PM.
Bristolduke is offline   Reply With Quote