
More detailed info on the ghost player problem
dougaiton and DaveHorn,
Thanks. Well, to be honest, I didn't expect any replies like this - ones with praises in them! I have to confess that I was writing in ranting mood, so I thought maybe this thread will be less well received or outright ignored. You have truly made me feel much better now. Thanks again.
****************
Now, I have collected more info on this ghost player problem and I will share it here. The info is obtained from the 2-league, 12-team league with 30 years of simmed history. This is basically the same setup as the league in my previous league-wide rating level study for v6.01. (
http://www.ootpdevelopments.com/boar...ad.php?t=64539 ) The 'headcount' of ghost players is not done physically by searching for them, but rather it's done by finding players that showed up in CSV roster files but not in ALMANAC files. The relationship between inacurrate CSV info and ghost players was established and explained in the previous post. The following is a table for ghost player composition. The explanations of the entries are at the bottom.
Code:
(1) Table for ghost player composition
# of players (headcount)
Generated Ghost Percentage(%)
Position players 2043 90 4.405 %
Pitchers 1693 110 6.497 %
All 3736 200 5.353 %
# of PYA (player yearly appearance)
Generated Ghost Percentage(%)
Position players 20285 989 4.876 %
Pitchers 16946 1332 7.860 %
All 37231 2321 6.234 %
Note: (1) 'Generated' represents league generated players from initial league
creation and all draftees until year 29 (The draftees of current/last
year do not count as they haven't had the chance to become a
ghost player - they are new to the league and are with zero pro year.)
(2) 'Ghost' represents ghost players that are players that are supposed
to have exited the league but not properly removed from database.
(3) 'Percentage' is just percentage ratio.
(4) 'PYA' stands for player yearly appearance. This is defined as how
many times that a particular player has show up (as in year count) in
the whole group data. So, for example, a player with 10 years in league
(from draft to retirement) will have 10 player yearly apperances in
the data.
The next part is the the comparison between permanent/temporary ghost. As discussed in the previous post, all ghost players are temporary. However, if theeir player IDs are never recycled, they will apears as 'permanent'. Below is the comparsion between still-existing ghost players (after 30 years, considered as 'permanent ghost players here') and all ghost players ever existed. The explanations of the entries are at the bottom.
Code:
(2) Table for Permanent/Temporary ghost player composition
Ghost player number count after 30 years
Ghost Total Percentage(%)
Position players 31 90 34.444 %
Pitchers 47 110 42.727 %
All 78 200 39.000 %
Note: (1) 'Ghost' is for ghost player number count after 30-year sim.
('permanent' ghost players)
(2) 'Total' is for total ghotst player number count from duting 30-year
sim period. ('all' ghost players)
As it can be clearly seen, the portion of ghost players can not be considered insignificant. Perhaps this warrants that this is a serious problem that needs to be addressed.
While fixing the ghost player problem may be an important issue, the CSV info accuracy should be of even higher priority in my opinion. I seems to be in the minority group who view CSV as an essential and indispensable tool. However, it is a really good tool when you plan to do some studies regarding player ratings or just use CSV for global rating editing/tweaking in general.
Well, that's it for the ghost player issue. Thanks for reading.