|
||||
| ||||
|
|
#1 |
|
Global Moderator
Join Date: Nov 2002
Location: Queens, NY
Posts: 9,848
|
My Lahman Database Project
I figured I'd describe this before I go to work on it.
I'm working on a "fixed" version of the Lahman database along the lines of Ankit's, but actually very different. The idea is this: For each player in major league history, I will combine each of his seasons with his career average. In other words, if a guy only has 1 AB in a particular season but he averages 500, he will get 251 for that season. If a player has a seasonal average of 25 homers in 500 AB, then for a season where he has 10 homeruns in 50 AB, he will have 18 homeruns in 275 AB. The idea is to take the concept of a career average database, which makes every year of a player produce stats of his career average, and combine it with the fluctuations of each year. This way a guy who only got a few AB as a rookie but went on to have a nice, long career will get the benefit of the career, but guys will still have ratings and talents that are not equal, giving room for change over the course of a career. A couple of things Ankit's does that this won't: I will not have separate entries for hitters and pitchers for guys who did both. Ruth will have both skills. This is just a personal preference of mine. Every player in mine will show up in his actual first year, rather than in his first substantial year. As great a job as Ankit has done (and it is great), this is one thing that has always been bothersome to me. Because I'll be using the career averages combined with real stats, this won't produce the problems it does with the regular Lahman DB. I have a Thanksgiving break from classes, so I'll probably get working on it then. Does anyone have interest in this? And if so, does anyone want to host it? I'll probably put it up on Baseball Sim Central after it's done and I've tested it. Then others can test it and let me know if it works okay.
__________________
My music "When the trees blow back and forth, that's what makes the wind." - Steven Wright Fjord emena pancreas thorax fornicate marmalade morpheme proteolysis smaxa cabana offal srue vitriol grope hallelujah lentils |
|
|
|
|
#2 |
|
All Star Reserve
Join Date: Sep 2002
Location: Niles, MI
Posts: 933
|
I have no problem putting it up at the Download Center when its done.
__________________
PSN: JoeRockEHF |
|
|
|
|
#3 |
|
Hall Of Famer
Join Date: Jan 2003
Posts: 2,616
|
Sounds interesting to me, keep us updated!
__________________
A wise man once said "shutup and drink your beer!"
|
|
|
|
|
#4 |
|
Hall Of Famer
Join Date: Jun 2004
Posts: 4,332
|
im constantly searching for the "perfect" database, so im interested
if you need some kinda help, im free this week too
__________________
2 Wild Cards, 11 Division Champs, 4 League Champs, 3 World Champs, and 3 Best GM awards Baseball Maelstrom - New York Mets - 180-149 .547 Corporate League Baseball - Coke Buzz - 889-649 .578 Western Hemisphere Baseball League - Santiago Saints - 672-793 .459 Record - 2428-2271 .517 |
|
|
|
|
#5 | ||
|
Global Moderator
Join Date: Nov 2002
Location: Queens, NY
Posts: 9,848
|
Quote:
Quote:
__________________
My music "When the trees blow back and forth, that's what makes the wind." - Steven Wright Fjord emena pancreas thorax fornicate marmalade morpheme proteolysis smaxa cabana offal srue vitriol grope hallelujah lentils |
||
|
|
|
|
#6 |
|
All Star Reserve
Join Date: Sep 2002
Location: Niles, MI
Posts: 933
|
Yup, just go ahead and email me at JoeRockEHF@comcast.net.
__________________
PSN: JoeRockEHF |
|
|
|
|
#7 |
|
Major Leagues
Join Date: Apr 2004
Location: Philadelphia, PA
Posts: 378
|
I would be interested...would you be doing the same with the fielding data?
|
|
|
|
|
#8 |
|
Major Leagues
Join Date: Apr 2004
Location: Philadelphia, PA
Posts: 378
|
I would be interested in seeing how the results turn out, certainly compared to Ankit. For those players who have just a few games in their first year, which are most players, your adjustment will basically substitute their career average per plate appearance for their first year. The rest of the years will "regress toward the mean" i.e. career average, so you just end up cutting out the tails of the distribution. In other words, you are getting rid of poor performance years AND great performance years.
As an example, I performed your adjustment to Babe Ruth's batting stats to see what the stat lines would look like. In his first year, 1914, he only played in 5 games. So, his 1914 rookie year is almost identical to his career average line per plate appearance. For the next few years, Ruth played in a limited number of games as a hitter, so for the years 1914-1917, his stat line is basically similar. That is, batting avg, slugging, on-base, OPS, and HR% are close to his career average. The other years also regress to this mean. His lowest HR total is 16 (vs. several single digit years) and his highest is 46 (vs. 60 and three other years of 50+). His batting average low is .315 (1934 when he hit .288) and his high is .371 (1923 when he hit .393). His OPS ranges from 1.052 to 1.280 (compared to 0.500-1.377), his career average was 1.161. (Note, I only counted BB in OBA calculation for simplicity). Ruth's regular career averages and his ctorg adjusted career averages would be EXACTLY the same, since you are including a constant (his career avg) each year. The only difference is that you are flattening out the distribution. Per plate appearance ratio, your adjustment and Ankit's would be the same. The difference is that Ankit's rookie year would have a higher number of games played and thus more plate appearances. How does OOTP calculate its ratings when importing rookies? If it is based on just the rookie year, then I am not sure there would be any difference between the two. If it is based on career totals, then there would be no difference between the two. If it is based on career averages, then there would be no difference. The only way there would be a difference is if OOTP factored in peaks in the years. For instance, if accounted for a career average of 32 HRs with a peak year of 60. I don't believe it does look at the career distribution of a player. What you are setting out to do is no small task, so I would hate to see it result in the same thing as Ankit's database. |
|
|
|
|
#9 |
|
All Star Starter
Join Date: Sep 2003
Posts: 1,571
|
This certainly is an interesting idea but it would seem to be too big of task if the only difference between this Dbase and the Ankit dbase ends up being debut season.
But maybe I am not clear on the changes you are making. |
|
|
|
|
#10 |
|
Hall Of Famer
Join Date: Jun 2003
Location: Minneapolis, MN
Posts: 3,411
|
Sounds interesting enough. I'd be interested in giving it a shot once I get my computer fixed and can play OOTP again.
|
|
|
|
|
#11 | |
|
Global Moderator
Join Date: Nov 2002
Location: Queens, NY
Posts: 9,848
|
Quote:
You have a very good explanation of the mentality behind why I wanted to do this. It gets makes players who had off years still be worse in those off years, but not some huge degree where it can't be overcome. The way it ends up working is that the player's career numbers are used to generate his talents, but the individual seasons are used to generate the season you import. It has much more of an effect for players who played substantially in their rookie years than it does on players who didn't. It also has a substantial effect on all players in the initial import. Some examples: If you import the 1987 season, you will get a Wade Boggs with a power level that is higher than his career numbers. He will hit more like he hit in 1987 than he did in other seasons, but because his talents will still be career-based, he will return to his career form in later years. If you import 1987, you will get a Keith Miller on the Mets who is a little bit better than the real Keith Miller, but only for a season or so, just like the real Keith Miller, but he won't be so great that you'll want to start him all the time. One area where it makes a difference is stolen bases. If you import Tony Gwynn earlier in his career, when he stole lots of bases, you will get Tony Gwynn stealing lots of bases. Later in his career, he won't be as good. The early one won't be quite as good as he was then and the later one won't be quite as bad, but there will be differences. The main thing I've noticed is that it creates bigger arcs in player careers. Because players are importing with numbers that are different from their total career numbers (more in line with their actual rookie seasons), you get a certain kind of player who gradually develops into another kind of player. There is a lot of development that goes on. While a guy who had 2 AB in his rookie year won't end up being much different from the same guy generated by Ankit, a guy who had 500 AB in his rookie year will be significantly different. One example would be Roberto Alomar. Not the best example, since only his average has any significant difference in his rookie year, but the first guy who had a lot of AB in his first year that comes to mind. Selected stats from his rookie year: 545 AB, 145 H, .266 Avg, 24 2B 9 HR, 24 SB Career averages: 533 AB, 160 H, .300 Avg, 30 2B, 12 HR, 28 SB What you get when you import him: Rookie year: 539 AB, 153 H, 284 Avg, 27 2B, 11 HR, 26 SB, with talents like his career numbers. For most of the numbers, there's not much difference, but you do have a guy who should hit about .284 in his rookie season and deveop into a .300 hitter. He won't be that at first, but he should turn into it - or, he might not. Here's a good example: Steve Finley. 1989, his first year: 217 AB, 54 H, .249 Avg, 2 HR Career average: 529 AB, 146 H, .276 Avg, 18 HR Rookie import: 373 AB, 100 H, .268 Avg, 10 HR, with talents like his career numbers So you get a guy who is less of a power threat when he comes up and not quite as good an average hitter, and he develops gradually into a better power hitter. There is more variance in how guys turn out simply because of the difference between their career numbers and their import numbers. This means there's a reasonable chance these guys won't develop into the players they are, and if they do, it will be a gradual change. I hope this makes sense to people. The concept itself is one I've observed by doing this myself for the past two years.
__________________
My music "When the trees blow back and forth, that's what makes the wind." - Steven Wright Fjord emena pancreas thorax fornicate marmalade morpheme proteolysis smaxa cabana offal srue vitriol grope hallelujah lentils |
|
|
|
|
|
#12 |
|
Major Leagues
Join Date: Apr 2004
Location: Philadelphia, PA
Posts: 378
|
I definitely like the idea of importing players in their actual rookie year, especially if they won't be penalized for having limited playing time. You get their real debut year but the potential for a more realistic early career progression.
I will definitely try it out, let me know when it's ready. |
|
|
|
|
#13 |
|
Hall Of Famer
Join Date: Mar 2003
Location: Madison, WI
Posts: 2,731
|
I'm interested. I haven't done a historical league in a long time so I'm getting interested in doing one again.
__________________
Formerly in the OTBA - Stockholm Royal Squirrels of Sweden OOTP Grand League Champion 2015 |
|
|
|
|
#14 |
|
Major Leagues
Join Date: Apr 2004
Location: Philadelphia, PA
Posts: 378
|
Database comparison
Regarding the rookie year issue and ctorgs proposed database, I wanted to see how players imported depending on the different databases. Using Babe Ruth, I compared Lahman 5.2, ctorg's adjusted DB (season plus career avg / 2), Ankit's DB, and Ankit's CareerAvg DB.
A review of the issues: - Lahman has some missing data. AnkitDB fills in some of those holes, also cuts down the size of the database by eliminating anyone with a short career, i.e., the non essential players. - Players importing in their rookie years from Lahman do not develop as expected, primarily due to players' limited number of games in their debut years. We know that there is a chance that players will not 'peform' like they did in real life, but for many Lahman imports there is no chance, including Ruth - AnkitDB solves some of these problems by removing those early years and importing players when they have a sufficient number of games, but for many players that is a few years after their actual debut year - However, some players in AnkitDB who have a sufficient number of games import with very low ratings because of poor early careers. Even though talent figures might be promising, they won't get the opporunity to play - Ankit's career avg DB removes the risk of not developing by importing players with their career avg figures so early poor years have little impact, but like his regular DB the players do not import in their actual debut year - Note, Ankit suggests using his regular DB when starting a historical league then using his career avg DB to import rookies - Ctorg proposes to create a 'compromise-type of DB' by having players debut in their actual debut year but also having a chance at developing. Ruth Rookie Year Import Analysis: - See table below - Remember that Ankit's rookie years are based on when a player has a reasonable number of games, so Ruth imports in 1918 rather than 1914. - Because Ruth only played in 5 games as a hitter in 1914, Lahman imports him with very low ratings and no talent. No chance of progressing to anything close to Ruth the hitter - Ctorg's ratings are better, maybe too high for a rookie, but the talent ratings are appropriate. - AnkitDB's ratings are decent for Ruth as a rookie, not incredibly high contact or power but certainly a very good prospect. Of course, I believe this was Ankit's intent when he eliminated early years in a player's career. Talent ratings are appropriate. - Note that AnkitDB talent ratings are similar to ctorg's. Keep in mind that if you imported a player over and over again, the ratings will not be the same each time, they are close, within a few points, but they do vary. So, ctorg and Ankit's talent ratings are really the same, which makes sense because they are based on roughly the same career totals (there is also a small difference because Ankit cut out Ruth's early years and ctorg does not). - Ankit's CareerAvgDB imports Ruth at a level higher than his regular DB and also higher than ctorg's, maybe too high for Ruth as a rookie. Talent numbers are appropriate and similar to ctorg's and AnkitDB, again because they are all based on the same career. HTML Code:
Rookie Year Ratings DB Yr Age Contact Gap Power Eye Avoid K's lahman 1914 19 14 21 3 1 10 ctorg 1914 19 82 79 100 134 61 AnkitDB 1918 23 65 122 56 101 57 AnkCarAvg 1918 23 92 79 110 145 65 Rookie Year Talent DB Yr Age Contact Gap Power Eye Avoid K's lahman 1914 19 1 1 1 1 105 ctorg 1914 19 96 86 108 147 61 AnkitDB 1918 23 96 84 113 149 62 AnkCarAvg 1918 23 97 86 109 154 62 Conclusions: - Lahman unadjusted will not produce good results. This is NOT because the database is bad (or as some posts say Lahman sucks). The database is great, accurate, and useful (and free remember). However, the way OOTP uses it may create historical sim inaccuracies beyond what we would expect from random fluctuations. - Ankit's DB works well for most players, especially if you adjust the development modifiers, so players can have a chance at reaching their potential. The only negatives are that players do not debut in their actual debut years and solid career players who started out slow (having sufficient playing time) may not import with good enough ratings - this is not Ankit's fault just a reality, and Ankit minimizes a lot of this - Ankit's career DB as an import for rookies allows solid players with poor early careers to have a chance. Negatives are the debut year is not accurate and players' may import at too high a level - ctorg's proposed DB seems like a reasonable solution by combining all of the good qualities of Ankit's DBs and reducing the negatives (by the way Ankit's pros far outweigh the cons). Players import in their actual debut years. Players should import with reasonable ratings, maybe high in some cases, but better since the ratings will be in between Ankit's DB and career avg DB. It will also differentiate players who had subpar rookie years but solid careers. Players with a lot of playing time in their rookie years will look the same as Ankit's career avg DB imports, but players with limited playing time or below career average performance in their rookie year will import at more appropriate levels thus taking a little longer to develop (but at least having a chance to) similar to real life. And since OOTP seems to factor in career averages rather than peak years, talent ratings will not be affected. That is, ctorg's 'smoothing' adjustment to a player like Ruth (his 60 HRs in 1927 would be adjusted to 46) does not affect the way the talent ratings are calculated (because his 4 HRs in 1915 would be adjusted to 18). In other words, Ruth's career totals and averages in ctorg's DB would be the same as Ruth's actual career totals and averages. - The only negative with ctorg's is that it's not done yet |
|
|
|
|
#15 |
|
Global Moderator
Join Date: Nov 2002
Location: Queens, NY
Posts: 9,848
|
And speaking of it not being done yet...
I made some progress over the weekend. I was hoping to finish it, but, alas, real life called and I had to answer. I'm going to try to get at least a preliminary version posted somewhere soon, hopefully within a few days. One thing I haven't decided is what to do with players who barely played. Should I keep everyone or eliminate those who only played in a game or two? One part of me wants to keep everyone just to be inclusive. I mean, who knows what kind of career John Paciorek would have had, right? Another part of me just wants to keep the size down and get rid of guys who just take up space. I'm leaning toward deleting them right now.
__________________
My music "When the trees blow back and forth, that's what makes the wind." - Steven Wright Fjord emena pancreas thorax fornicate marmalade morpheme proteolysis smaxa cabana offal srue vitriol grope hallelujah lentils |
|
|
|
|
#16 |
|
Major Leagues
Join Date: Apr 2004
Location: Philadelphia, PA
Posts: 378
|
Get rid of non-essential players, it is very rare for someone to develop that had such limited playing time. In a lot of cases, their rookie year ratings far exceed their career. And their rookie ratings aren't that good anyway.
Size matters, so cut the file down. |
|
|
|
|
#17 |
|
Major Leagues
Join Date: Oct 2004
Posts: 467
|
Great stuff ctorg! Sounds like it will make for a great database!
So far I haven't really done any historical leagues, but after reading through this thread I'm finding myself wanting to try one out with the ctorg database.
__________________
a.k.a.: Obscene Change-up, Wicked Heater, Nasty Knuckler What is the sound of one light bulb turning itself? Crash Test Yankees! |
|
|
|
|
#18 |
|
Global Moderator
Join Date: Nov 2002
Location: Queens, NY
Posts: 9,848
|
I hope it lives up to expectations. I kind of wish I hadn't announced it ahead of time now. I mean, I wanted people to know I was working on it and would have something soon, and I wanted to get input, but it would have been cooler if I'd just sort of announced it and uploaded an initial version.
The most difficult part of the whole thing, to me, is doing fielding. Not that it's hard to figure out or anything, but it's really tedious because you have to do each position for each player.
__________________
My music "When the trees blow back and forth, that's what makes the wind." - Steven Wright Fjord emena pancreas thorax fornicate marmalade morpheme proteolysis smaxa cabana offal srue vitriol grope hallelujah lentils |
|
|
|
|
#19 |
|
Hall Of Famer
Join Date: Jun 2004
Posts: 4,332
|
i vote get rid of them
|
|
|
|
|
#20 | |
|
Major Leagues
Join Date: Oct 2004
Posts: 467
|
Quote:
The bottom line to me is that you discovered a good way to work with historical data that should improve on the results of already very good databases....some people might experience some atypical results, but your reasoning on this seems very sound to me.
__________________
a.k.a.: Obscene Change-up, Wicked Heater, Nasty Knuckler What is the sound of one light bulb turning itself? Crash Test Yankees! |
|
|
|
| Bookmarks |
|
|