|
||||
|
![]() |
#1 |
All Star Starter
Join Date: Apr 2003
Location: Massachusetts
Posts: 1,179
|
CBBL's 1890 Start Modifications
See this thread for an introduction to the various mods I'll be posting in this and other subforums.
First, names. As my league starts in 1890 and aims to be era-appropriate, I needed to update both the first_names_english.txt and names_english.txt files for the era. This involved pulling data from namecensus.com and it's former URL names.mongabay.com under which some data files still reside. First -- first names Namecensus.com has historical names databases going back to 1880. The male first names databases contain anywhere from 1800 to over 8000 names. The database does not distinguish the ethnicity of first names (it does for surnames), so some challenges in using the data do exist, particularly as you get deeper into the 20th century. Normally, when I use modified name files, I target name sets 20 years prior to the draft year, as that's a good representation of what those being drafted would be called. For my 1890 start date, I can't accomplish this as the oldest data is from 1880, so I'll have to compromise a bit. The positive side is that I won't have to reimport the first names database for 20 years -- when my universe gets closer to the year 1910. Another thing to note -- because the data comes primarily from U.S. Census data, there are errors in it -- for example, females that were recorded as males (Betty, Lucy, Margaret, Sue, etc.). So, some editing is necessary and I will constantly need to tinker with the file to remove these obvious errors. Same goes for obviously non-caucasian names. Lastly, there are numerous abbreviations and spelling errors that may need adjustment (e.g. Chas for Charles). I usually discover these once imported into OOTP and then adjust them in the original file for future universes. To establish a reasonable frequency that OOTP can handle, I reset the frequency of names such that the most popular name has a frequency of 50000. I don't know about OOTP22, but I've found with earlier versions if the frequency was too large, OOTP wouldn't load the database correctly. I may experiment with larger #s to see. SO.... Attached are the first name files for the decades 1880 through 1920. The files only replace ethnicity id 0. Many female names and misspellings have been removed, but I haven't touched non-European names (Hispanic, Asian, etc.). Some female names and other errors may still remain. I'll post another message about the last names modifications after this one. Also -- I am happy to share the Excel files that I use to create and work with these .txt files. Let me know and I can share them with you. |
![]() |
![]() |
![]() |
#2 |
All Star Starter
Join Date: Apr 2003
Location: Massachusetts
Posts: 1,179
|
OK, Last Names. I probably should have started a new thread for this one, as this file can be used for any era.
Again, the data comes from the old names.mongabay.com website. Unfortunately, as part of their reorganization, it appears certain tables are no longer on the site. The data I pulled has just shy of 50,000 names, and provides statistics to indicate whether the name is used by people who primarily consider themselves white, black, asian/pacific, north american native, hispanic, or a combination of the above. This allows easy extrapolation of the data into subsets by ethnicity. Because my league starts so far back in history, I've not yet created subfiles for anything but caucasian, but can do so if anyone wants them before they become necessary for my universe. Unlike the first names data, this data is sourced solely from the 2000 U.S. census, but separating the data by ethnicity does enable reasonable replication of a historic nameset. For caucasian names, certain sub-ethnicities, such as Italian, may need to be filtered further to promote greater realism in the era. I have not done this. To ensure as few non-caucasian names are in my inaugural file, I only utilized names where at least 30% of the respondents associated themselves as non-hispanic white. This is not a guarantee and further data cleanup will be necessary to further clean the datafile -- a problem unlikely to exist when the sub sets are created for african-america, asian, and hispanic. In any event, this filtering still yields over 43,000 caucasians, which have been incorporated into ethnicity id 0 in the attached file. The african-american names, under ethnicity 39, includes about 3300 names, and the hispanic names, under a NEW ethnicity 41 (modification of the world_default.xml file required) contains about 3600 names. Given the size of the file, only these three namesets are included in the attachment, which still had to be compressed to be able to be uploaded to this board. As with the first name file, I'm happy to share the Excel file I used to build and adjust this nameset, and will update this file from time-to-time as I tweak it. Comments and suggestions welcome. Last edited by cbbl; 05-23-2021 at 07:26 PM. |
![]() |
![]() |
![]() |
#3 |
Hall Of Famer
Join Date: Jun 2011
Posts: 3,703
|
These are fantastic!!!!
![]() Thanks so much for posting these and your other mods. And thanks especially for listing your sources.
__________________
American-Ethnic (and Canadian) Namesets Historical Minor League Schedules 1870s City/Team Nickname Randomizers "It's Usually Sunny in Philadelphia" weather mod Negro League Schedules Last edited by joefromchicago; 05-24-2021 at 12:58 AM. |
![]() |
![]() |
![]() |
#4 |
Hall Of Famer
Join Date: Dec 2001
Location: Ontario Canada
Posts: 9,740
|
Yes, outstanding. Thank you for sharing them.
__________________
Cliff Markle HOB1 greatest pitcher 360-160, 9 Welch Awards, 11 WS titles |
![]() |
![]() |
![]() |
Bookmarks |
|
|