Okay, let's see if I can cover things without taking 20 pages to do it.
First, a bit of background. It seems to me the most efficient and speedy way to get the statistical data input is to make the Excel files as easy to use as possible. The simpler they are to use, the quicker one can enter in the info. But there are some questions which arise from this approach, namely is it possible to take these simple Excel files and recomposite them into something more complex later without a lot of trouble?
Now for some details.
BATTING STATS
The player stats for batting included in the
Lahman Database are:
G AB R H 2B 3B HR RBI SB CS BB SO IBB HBP SH SF GDP
Naturally, the more of these statistical categories we can capture, the better, though of course it will be limited to what's published in the TSN Guides.
Here are the individual batting stats recorded for the
1946 AA, listed in the same order as they appear. Players who bat left or who are switch hitters are noted in the list.
10 or more games (ranked by batting average):
LastName FirstName Club G AB R H TB 2B 3B HR SH SB Avg.
Fewer than 10 games (ranked by batting average):
LastName FirstName Club G AB H Avg.
To me, the quickest and easiest way to enter the stats into an Excel file is for the file to match the same stats order as are in the Guides. This means that one simply types the data just as it appears in the scan, without having to do any searching or switching back and forth between columns (for our purposes we can skip entering TB and Avg., since these are calculated stats).
The question which immediately arises are the players with fewer than 10 games. They have very limited data, so should we include such players? I would say no, since they had so little playing time I don't think it's worth including them. What is the rest of the team's opinion?
A number of important stats that are captured in Lahman, such as BB and SO, aren't included in the main AA batting list but are instead listed separately. Below shows the categories and the order:
Additional Batting - 10 or more games (ranked by RBI):
LastName Club G BB HBP RBI SO GDP
I definitely think these stats should be entered, but the fact they are included in a separate list raises some issues. The players are only listed by their last name, which is going to complicate the data entry. There are two ways to handle this, it seems to me.
The first way is to enter these extra stat areas on the main list in the Excel file. But this means that to enter in the stats you'll have to scroll up and down the list of players to find the correct one before you can enter in the numbers. This will get time-consuming very quickly and would slow down data entry significantly. The second way is to enter in these extra batting stats on their own worksheet; this would make the data input very quick. But the drawback here is the question of how difficult will it be to recomposite these other batting stats with the main list. Is more time lost in recompositing together the batting stats or is more time lost in trying to enter all the batting data onto one single list?
I personally think it's the latter, but I want to hear everyone else's opinion as well. Particularly those who are knowledgeable with Excel and who can give some idea on how difficult it would be to combine the batting data from two different lists into one consolidated list.
***End of part I***