View Single Post
Old 10-14-2007, 11:14 PM   #18
Garlon
Hall Of Famer
 
Join Date: Jun 2004
Posts: 4,268
An update:

I converted all batting lines of all players 1871-2006 to the park neutral 4.63 runs per game environment. If you go to baseball-reference.com you can go to any player's page and click "neutralize stats" then click the 750-runs option. The neutralized stats you will see there are essentially the same as what will be in this DB. I used 10 decimal places in the conversion process before finally rounding the stat totals, so there are very slight discrepancies of a couple Hits or AB here or there over the course of a player's career.

Ty Cobb's career totals
Baseball-reference vs Garlon totals

G: 3216 vs 3216
AB: 12408 vs 12407
R: 2609 vs 2610
H: 4732 vs 4730
2B: 808 vs 809
3B: 337 vs 335
HR: 128 vs 128
RBI: 2253 vs 2252
BB: 1392 vs 1392
SB: 1027 vs 1029

The conversion method used by baseball-reference was published by Bill James in his Historical Abstract pg 740-743. Bill James provides a conversion of Willie Davis's career into this context.

Willie Davis career totals

Baseball-reference vs Bill James vs Garlon

G: 2439 vs 2429 vs 2439
AB: 9496 vs 9473 vs 9494
R: 1457 vs 1462 vs 1456
H: 2858 vs 2860 vs 2858
2B: 441 vs 447 vs 441
3B: 153 vs 154 vs 153
HR: 199 vs 201 vs 199
RBI: 1248 vs 1250 vs 1244
BB: 465 vs 459 vs 465
SO: 981 vs 977 vs 981
SB: 443 vs 457 vs 443


After I finished that I fixed missing player strikeout totals for 1897-1912 by estimating the missing strikeouts for individual players based on their team's K:Out ratio in a given season.

I also adjusted the 1871-1875 and 1886-1897 stolen base totals to the frequency of stolen bases per times reaching base of the period of 1897-1909. The definition of a SB was different or inconsistent before 1897 so this adjustment needed to be made. The period of 1876-1885 had no stolen base totals whatsoever, so I had to estimate those using a formula based on SB per times reaching base from 1886-1897. This was all done after finding the new data after converting to the park and run-neutral environment.

Caught stealing data is essentially not available for 1871-1919. There is some available data from 1914-1915, which gives a recorded success rate of stealing of 55%. The period of 1920-1925 also yields a 55% success rate. I finished estimating the caught stealing of all players 1871-1899 based on a formula which puts the league average at 55%, but allows for individual players to be above or below that mark...generally speaking, great players tend to turn out with an above average SB% using this method.

I still have to fix the missing Caught Stealing data for the NL and AL from 1900-1919, and the NL from 1926-1950.

After I get that finished, we have to go in and do a host of minor fixes:
-Pro-rate the missing WWII seasons for players

-Adjust players who only had a cup-of-coffee and have very low stat totals. We will set these players to the replacement level.

-Adjust low AB seasons of players who had at least 500 career ABs. For example, after neutralizing his stats, Ty Cobb in 1905 only had 163 AB. We will bring him up to 251 AB by crediting him with 88 extra AB pro-rated at his career average. So in 1905 instead of him having 163 AB and 43 hits (.264), he will have 251 Ab and 77 hits (.307). This type of edit will alleviate alot of issues in OOTP involving ratings of players with low AB totals since everyone in the DB will be adjusted to a minimum of 251 AB in any given season.

-Fix gap seasons in the playing records of all players. For example, Willie Mays did not play in 1953 (military service), so if you start a league in 1953 Mays will not be in the game, nor will he import in 1954 because he was not a rookie in 1954 so h will be missing from your universe forever. There are players in any given season in history who missed a year here or there due to injury, military service, or just getting sent back down to the minor leagues for an entire season. We will fill in these "gap seasons" in the records of all players by taking an average of their seasons before and after a given gap in their records to estimate their performance level. Willie Mays in 1953 will have a stat line of 366 AB and 120 hits (.328).

Once we do all of that then the batting files will be finished. Then we need to work with the pitching and fielding files.
Garlon is offline   Reply With Quote