View Single Post
Old 01-01-2013, 09:12 PM   #26
JeffR
FHM Producer
 
JeffR's Avatar
 
Join Date: Apr 2002
Location: Kelowna, BC
Posts: 17,259
Quote:
Originally Posted by geckon View Post
1) What encoding the file with the Czech special characters included should have? I suppose it's UTF-8 but I rather ask...
Yes, UTF-8.

Quote:
2) Where did you get those names? There are some (e.g. Jeremy, Georg, Nikolas etc.) I don't consider Czech although I admit there could be a Czech man using such name - but it would be _very_ rare. Is it possible to delete names from the file to make it more accurate?
I started with the names of all the Czech nationality players in the database and then some from things like older seasons, lists of Czech sportsmen, and so on. So "Jeremy" is from Jeremy Tichý (Plzen U-20 player), Nikolas is from Nikolas Tverdak (Chomutov U-20), etc. I'd like to keep the really rare names in, but if anything looks completely wrong, take it out.

Quote:
3) Is it possible to add new names to the files? I missed some names after a brief scan of the files (like Miroslav or Radovan).
Sure, go ahead.

Quote:
4) I suppose the first number after a name expresses how often the name should occur. Is that right? Did you get those numbers from some statistics? There are some which I feel they are not right (like Jaromír or Josef) although I don't have any real numbers in my hands now. I could probably find and use some statistics to correct the numbers. Is such effort desired?
Yes, all the frequency numbers are added together, then the chance of an individual name being used is X/Total.) Those files are based mainly on the number of players with that name appearing in the database, with a few modifications to make the more common names more frequent.

Quote:
5) What does the second number express?
It's the nationality identifier for the name, 26 is Czech.

Quote:
My plan is to add more names to the file, take some name frequency statistics and correct the freq numbers accordingly. I must admit though that I think I won't make it all done today. Would you like to see such work done or are you not interested in this?
Definitely, go ahead and do it. I don't need it right away, the other name files are a long way from being done - I just happened to have the Czech one done already.

Quote:
I searched for Czech name databases and found some. I have three possible plans. Please tell me which one you find better for the game.
I think 1) sounds like the best option. The main problem with 2) is the Slovak names - the Slovaks will have their own name list, and the way naming works, we can just specify that 2% (or whatever the exact number is) of Czech-nationality players get Slovak names - so I'd like to keep the name lists (European ones, at least) as single-nationality as possible.

Quote:
(Valid only for the first plan) What year to pick? Since I suppose the first newgens will be generated for the 2013/2014 season it should probably be something like 1990-2000. Do you agree? Which one would you pick and why?
Maybe 1993-present. Since the names are going to be for future players, I think current naming trends should be reflected. (Even though it's a little disturbing to see character names from Twilight, Harry Potter, and so on in the newest lists.)

Quote:
(For developers only) What range is used for the name frequency in the name generator? I need to know that to be able to normalize the real frequency numbers.
The frequency numbers are added together for each nationality and then the chance of an individual name appearing is (its frequency number/total number). So they're only relative to each other and there's no specific range to use. But don't try to make it represent the real life numbers perfectly - it's better to set a cutoff where all names below a certain percentage of the population get a 1, set the frequencies for the more common names in proportion to that - and then lower those numbers for the common names by at least 50% (for some reason, if you try to match real life perfectly, it just doesn't look right - there are far too many common names.)

Quote:
How many names would you include into that file?
As many as you want. The Canadian and American last name lists will be 8000-9000 names long, so if you want to have a few thousand, no problem.

Thanks for the help. Again, take your time, it doesn't take long for us to incorporate them into the names file once we have them.
JeffR is offline   Reply With Quote