|
The fake pitching lines are especially problematic because they are in minor leaguers AND MLB'ers. Even as I do audits (like on-going city of birth fixes, year of birth fixes, height, weight, etc.) I generally limit myself to AL/NL and NeL.
There is a pattern to those P lines though and they are obvious once you start looking for them. We are getting to the point (once I have some other projects behind me), that I could take the MiLB and its huge csv, resort by Historic Minors ID then year, take a certain number of rows at a time and let the AI tell me who looks suspect based on the repetitive pattern of those lines and then I save the search time and only burn my OOTP help time with targeted deletions to identified lines.
For now, they generally don't cause to much harm hence it is a project. But every but helps on a db as massive as this one is.
|