View Single Post
Old 01-12-2013, 11:37 AM   #81
YZG
Moderator
 
YZG's Avatar
 
Join Date: May 2012
Location: Canada
Posts: 367
Here we go -- a brand new Chinese set (1000 + 140). Chinese given names are as easy to gather as sand grains on a beach. Technically, any combination of characters that pleases the parents and is not offensive can be used as a name. This is very good, since China doesn't seem to publish much statistics about its citizens' names. The only challenging part for this project is to ensure that the combination is not considered feminine. Reaching 1000 was almost effortless. Family names are not very numerous, but those 140 ones do cover close to 90% of the Chinese population! Perhaps even more, since many surnames transliterate to the same thing. For instance, 黎, 李, 理, 里, 郦/酈, 栗, 厉/厲, and 利 are all different surnames which all transliterate to "Li" (I left out intonation accents for simplicity sake, I may include them in a future release if there's an interest).

I built my Korean and Chinese sets according to the family name / given name pattern. As such, the file "Chinese LN" (Where LN should stand for last names) contains given names, whereas "Chinese FN" (FN for first names) contains family names. If you guys prefer to uniformly apply the given name/family name format, please switch the sets accordingly.

There's also a Dutch set (593 + 2555). Data was simply taken from the seemingly reliable naamkunde.net. I simply removed the foreign names.

My next targets will be Kazakh and "Central Asian" (I definitely don't mind taking on those sets from places no one around here hails from).

- YZG
Attached Files
File Type: csv Chinese FN.csv (1.4 KB, 64 views)
File Type: csv Chinese LN.csv (13.1 KB, 55 views)
File Type: csv Dutch FN.csv (8.8 KB, 64 views)
File Type: csv Dutch LN.csv (44.4 KB, 69 views)
YZG is offline   Reply With Quote