Home | Webstore
Latest News: OOTP 27 Buy Now - FHM 12 Available - OOTP Go! 27 Available

Out of the Park Baseball 27 Buy Now!

  

Go Back   OOTP Developments Forums > Prior Versions of Our Games > Earlier versions of Out of the Park Baseball > Earlier versions of OOTP: Mods

 
 
Thread Tools
Old 06-07-2004, 12:48 PM   #1
jcato
Minors (Triple A)
 
Join Date: Jan 2004
Posts: 200
CatoBase and Win Shares

A while back, I said I'd look into the way CatoBase calculates Win Shares to determine if there were any errors and to document the differences between CatoBase and the Win Shares book. The way I wanted to do this was to follow along with the book as James did the 1998 St. Louis Cardinals and the 1932 New York Giants.

To do this, I needed a database with the stats for those two seasons. Luckily, I found a CatoBase-style database that had most stats for all AL and NL teams since 1876 on my hard drive. I don't remember where I got it from nor who did it. I'm sure it was someone on this forum, so, whoever it was, thanks.

There were a couple of issues with this database, however. First, the stats didn't always total to the same number as in the book. However, the differences were minor in all cases. Here's a chart showing some of the differences for the 1998 NL (the 1932 AL differences were even smaller):
Code:
		  Book	CatoBase
BFP		100162	  101504
Runs		 11932	   11919
Hits Allowed	 23279	   23238
Walks Allowed	  8743	    8728
Strikeouts	 17552	   17530
HR Allowed	  2585	    2578
Second, this database is missing some of the lesser used stats such as Hit by Pitch, Balks, and Wild Pitches. In order to keep the calculations on track, I manually entered these values from the book when necessary.

So, with the data in hand, I started going page by page and formula by formula though the Win Shares book. What I found could be divided into 2 categories. First, differences in the methods used because of the stats available. OOTP tracks an amazing number of stats, but the Win Shares calculations need even more. The second category is simply errors in the formulae.

The following are the issues I found in the first category.
  • Cannot use 'E' section of the Runs Created formula. This section is an adjustment to the Runs Created estimate and is only used for recent teams where the data is available. It requires data with runners in scoring position (RISP) and runners on base (ROB). OOTP tracks RISP, but not ROB, therefore this is not being used.

    E = Hrisp - (ABrisp * AVG) + HRrob - (ABrob * (HR / AB))

    As an example, in the Win Shares book, Mark McGwire in 1998 had a first RC estimate of 167.55591. The E section adjusted this to 171.54608. In the team context, he ended up with 165 Runs Created. In CatoBase, the first RC estimate is the same (167.55591), the second one is skipped, resulting in a final RC of 159.
  • Runs and Homeruns at Home and Away come from the bat_stats.csv and pitch_stats.csv files. There is one line for Home and one line for Away for each player, regardless of the number of teams he played on. Because of this, the exact number of runs and homeruns at Home and Away for each team cannot be determined. However, we can estimate it. We can make the player's Home/Away stats for each team proportional to the entire season. For example, take a player that scored 51 runs total, 19 at home and 32 away. For team A, he scored 14 runs. His runs scored at Home for team A would be 14 * (19 / 51) = 5. And his runs scored Away would be 14 * (32 / 51) = 9. For Team B, his runs scored at Home is 14 and Away is 23. So, 5 + 9 + 14 + 23 = 51 adds up to the correct number of runs scored for the season.
  • In Win Shares, Park Factors uses a 5 year average with the focus year weighted to equal the other four years. I use a 3 year average with the focus year weighted 3/5 and the other two years 1/5 each. Why? Why not? As James says, "There isn't any right way to figure Park Effects, because all the alternatives have a down side" (pg 86). So, this is the "wrong" way I choose to do it.
  • There isn't a good way to know if a park change occurs that should exclude that year's data from the 3 year average. The way it works is that if a team's abbreviation changes, it is assumed they moved into a different stadium. For example, in 1965, the Braves played in Milwaukee (ML1). In 1966, they moved to Atlanta (ATL). The park factor for the 65 Braves would include 1 part 1964, 3 parts 1965, and nothing from 1966. For the 66 Braves, it would include 3 parts 1966, 1 part 1967, but nothing from 1965. In contrast, the fact that the Yankees played in Shea Stadium in 1974-1975, would be missed. Since their abbreviation didn't change, each of those years would use the full 3 year average.

The second category was errors in the formulae. Surprisingly, there were only three:
  • Win Shares needs team double plays. CatoBase was summing the DPs for each player on the team and using that number for team DPs. This is wrong. Each player gets credit for a double play, so doing this over estimated DPs by nearly 3 times. For future seasons, the team DPs will be imported from the teams.csv file. This cannot be easily done for seasons already in the database. After looking at many leagues, it was determined that dividing the sum of player double plays by 2.89 gave the most accurate result. This error caused Fielding Win Shares to be too high and Offensive Win Shares to be too low. - FIXED
  • An error in the formula for expected DPs. - FIXED
  • An error in the formula for estimated runners on firstbase. - FIXED


And, finally, here are the results:

Code:
1998 St. Louis Cardinals
Player              Book  CatoBase  Diff
M. McGwire, 1B	      41     39       2
R. Lankford, OF	      27     25       2
B. Jordan, OF	      21     21       0
J. Acevedo, RP	      13     14	     -1
D. DeShields, 2B      15     14       1
R. Gant, OF	      11     12	     -1
T. Stottlemyre, SP    10     11	     -1
M. Morris, SP	      10     10	      0
G. Gaetti, 3B	       8      8	      0
R. Clayton, SS	       7      7	      0
F. Tatis, 3B	       5      7	     -2
E. Marrero, C	       6      6	      0
K. Bottenfield, RP     6      5	      1
J. Brantley, RP	       5      5	      0
J. Frascatore, RP      5      5	      0
T. Lampkin, C	       5      5	      0
J. Mabry, OF	       5      5	      0
K. Mercker, SP	       5      5	      0
M. Petkovsek, RP       5      5	      0
D. Osborne, SP	       4      4	      0
L. Painter, RP	       4      4	      0
M. Busby, RP	       3      3	      0
R. Croushore, RP       3      3	      0
J. Drew, OF	       3      3	      0
C. King, RP	       4      3	      1
P. Polanco, SS	       2      3	     -1
D. Howard, 2B	       2      2	      0
J. Jimenez, SP	       2      2	      0
P. Kelly, 2B	       2      2	      0
W. McGee, OF	       3      2	      1
D. Oliver, SP	       2      2	      0
L. Ordaz, SS	       2      2	      0
T. Pagnozzi, C	       1      2	     -1
M. Aybar, SP	       1      1	      0
B. Hunter 	       0      1	     -1
B. Witt, RP	       1      1	      0
		     249    249	      0
Pretty good, I think. While doing the 1932 Giants, I discovered that the error I fixed for estimated runners on firstbase was also wrong. I think the Catobase Win Shares would be closer to the book value if redone with this fix.

Code:
1932 New York Giants
Player              Book  CatoBase  Diff
J. Foxx, 1B          40      40       0
L. Grove, SP         33      33       0
M. Cochrane, C       30      30       0
A. Simmons, OF       24      24       0
M. Haas, OF          17      17       0
E. McNair, SS        17      16       1
M. Bishop, 2B        15      15       0
G. Earnshaw, SP      15      16      -1
R. Walberg, SP       15      15       0
J. Dykes, 3B         14      15      -1
D. Cramer, OF        13      12       1
T. Freitas, SP       12      12       0
R. Mahaffey, SP      10      10       0
B. Miller, OF         8       8       0
D. Williams, 2B       5       5       0
L. Krausse, RP        4       4       0
E. Rommel, RP         3       3       0
S. Cain, SP           2       2       0
E. Coleman, OF        2       2       0
J. Heving, C          2       2       0
E. Madjeski, C        1       1       0
                    282     282       0
I think these came out better for three reasons. First, the league totals in the database were closer to the book than the 1998 NL totals. Second, the fixed formula for estimated runners on firstbase. Third, the book didn't use the 'E' adjustment for Runs Created.

Overall, I'm pretty happy with the way CatoBase does the calculations. When the next update comes out, I'd recommend checking the 'Recalculate All Years' on the Win Shares tab the first time you create pages. This may take a while on databases with many seasons, but I think it would be worth it for the more accurate Win Shares.
jcato is offline  
Old 06-07-2004, 02:32 PM   #2
Hammer755
Hall Of Famer
 
Hammer755's Avatar
 
Join Date: Dec 2001
Location: Houston, TX
Posts: 2,348
Awesome stuff, Jeff. Thanks for putting the time into researching this.
Hammer755 is offline  
Old 06-14-2004, 02:18 AM   #3
Eckstein 4 Prez
Hall Of Famer
 
Eckstein 4 Prez's Avatar
 
Join Date: Dec 2002
Location: The OC
Posts: 6,358
Yeah, thanks Jeff. I'm still not sure why my totals were so screwy for my 1870's league, but I'll recalculate and see what happens.
__________________
Looking for an insomnia cure? Check out my dynasty thread, The Dawn of American Professional Base Ball, 1871.
Eckstein 4 Prez is offline  
Old 06-14-2004, 07:13 PM   #4
Malleus Dei
Hall Of Famer
 
Malleus Dei's Avatar
 
Join Date: Dec 2001
Location: In front of some barbecue and a cold beer
Posts: 9,490
Good job on that!
__________________
Senior member of the OOTP boards/grizzled veteran/mod maker/surly bastage

If you're playing pre-1947 American baseball, then the All-American Mod (a namefiles/ethnicites/nation/cities file pack) is for you.

Quote:
Originally Posted by statfreak View Post
MD has disciples.
Malleus Dei is offline  
Old 06-15-2004, 07:29 AM   #5
Big Train
Major Leagues
 
Join Date: May 2002
Location: Canada
Posts: 402
There is a bug in my database. It calculates one pitcher as having the win shares that 3 pitchers actually acquired that year. Is there any way I can edit this? [It is rather annoying because he is listed as the all time leader in single season win-shares.] Thanks, and great program, 6.0 rocks!
Big Train is offline  
 

Bookmarks


Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -4. The time now is 08:50 AM.

 

Major League and Minor League Baseball trademarks and copyrights are used with permission of Major League Baseball. Visit MLB.com and MiLB.com.

Officially Licensed Product – MLB Players, Inc.

Out of the Park Baseball is a registered trademark of Out of the Park Developments GmbH & Co. KG

Google Play is a trademark of Google Inc.

Apple, iPhone, iPod touch and iPad are trademarks of Apple Inc., registered in the U.S. and other countries.

COPYRIGHT © 2023 OUT OF THE PARK DEVELOPMENTS. ALL RIGHTS RESERVED.

 

Powered by vBulletin® Version 3.8.10
Copyright ©2000 - 2026, vBulletin Solutions, Inc.
Copyright © 2024 Out of the Park Developments