CSV Exports
I like to use Microsoft Excel to mess with the CSV database dump, and there are a couple things about the CSV exports that are frustrating.
1) The league_history_fielding_stats.csv file contains a field for sub_league_id. The league_history_batting_stats.csv and league_history_pitching_stats.csv files do not. Instead, the batting and pitching CSVs differentiate between different subleagues with a team_id field containing the team ID number of a team from that subleague. It would make a lot more sense, and make these CSV files a lot easier to work with, if the batting and pitching files just exported the sub_league_id like the fielding file does instead of this goofy method of exporting the team id of somebody from that subleague.
2) The players_at_bat_batting.csv file would be easier to work with if it included the date for each game. It does include the game_id, from which I can go get the date, but it would be a lot easier to automate what I want to do if players_at_bat_batting just included the date, or at least the year, itself.
3) It is not possible to easily and automatically re-construct the sequence of an inning from the players_at_bat_batting.csv file. This is because the export is ordered by player_id. Each entry includes the inning, base-out state, run differential, and the spot in the lineup. The problem is that it's hard to get Excel to notice when the lineup wraps around from the 9 spot to the 1 spot when the only thing that changes is the base state (it's complicated to write a formula for this). This is important if you wanted to construct your own run expectancy table in order to derive your own linear weights for things like calculating wOBA. You need to know the precise order of plate appearances to do this, and it's difficult to do that without manually scanning thousands of innings to see if there was a wraparound from the 9 spot to the 1 spot. If each entry in the players_at_bat_batting file also contained something like a "plate_appearance_id" field that started at 1 for the first plate appearance in each game and progressed from there, then this file would be easier to work with. You could just sort the file by game_id and plate_appearance_id and be done with it.
4) Not a problem, but the team history fielding stats CSV exports with the file name team_history_fielding_stats_stats.csv, whereas batting and pitching just have "stats" in the file name once.
Anyways, these are small potatoes requests, but boy would they make my hobby easier to mess with!
Last edited by matskralc!; 09-03-2023 at 02:12 PM.
|