Monday, January 18, 2010

THE FUTURE SEASON FORECASTER

This is something I've been kicking around with Sal and Seung for awhile now-- trying to come up with our own predictive tool, like Baseball Prospectus's PECOTA, for softball. I've concluded that this is a basically useless endeavor-- incredibly difficult with little payoff besides finding out if I'm as smart as I think I am. We're not professionals; Havelock will not be considering this when deciding whether to sign Alex or Zach to a multi-million dollar deal to patrol centerfield. But the idea refuses to die.

We do have an incredible statistical database, with basic batting stats going back to 1983, and the full range of stats starting in 1985. Pitching was not always tracked diligently, until Joe Gerber came along, and fielding is impossible to track at this level. Gerber, by the way, left me a valuable statistical database consisting of-- I believe 9-- complete seasons from the 1990s and 2000s. For those seasons at least, I could (if I accept this mission) weed out the park effects from the different fields we've used. It's easier to hit in the winter games, for example. Hastings produces a lot of doubles. Dobbs produces cheap homers. When SFLOI used Dobbs as its primary field, there was a HR explosion of epic proportions, similar to what would happen if the Colorado Rockies moved to Mexico City with 275-foot fences and built eight clones of Ted Williams with DNA taken from his frozen head. When I was hitting at Dobbs in late 2008, I hit 7 of my 12 homers in a single month. Jeff Appell homered 19 times in a season, Glen Lawrence 17, Ian Lebowitz 16. Heckscher, with its spacious outfield (and some claim, its difficult hitting backdrop), took away everyone's "power", though they were really the same hitters. Look in the lifetime stats-- the HR explosion is 1998-2001, and 2002-2005 is the dead ball era.

So, obvious conclusion #1-- some ballparks make certain hitters look good. Singles hitters do not benefit from Dobbs Ferry. Big fly-ball hitters do, as do line-drive gap hitters.

In baseball predictions, age is a big factor, but I don't think that would be the case in softball. Most baseball players are finished by 35, and softball hitters are never "finished", and you could make a case that the decline phase only even begins somewhere between 40 and 60. I would guess that how often a player plays, and how good their conditioning is, has a greater effect. So, I might look at how players perform after 200-AB seasons. Do they improve, or at least decline more slowly? Complicating this-- many of our players get at bats elsewhere, with other teams and pickup games. How can we measure for that?

Well, that's all the stats I have in me for today, and my girlfriend will probably kill me if I don't get off her laptop soon. Nate Silver, one of the guys who designed PECOTA, later said that ballplayers perform pretty much like they did in past seasons. That's good enough for me. Maybe.

5 comments:

Brand Guys said...

Interesting. Is there a place for a Theo Epstein or Billy Beane in SFLOI?

Stat Lab, I'd look for a single number that helps us figure out who contributes the most to victories.

There is only one instance where we ever attempted to do something like this. Many years ago, we had a huge debate over what makes a player most valuable. Most home runs? Most RBI's? Most $ in one's pocket to loan another player so they don't end up on the street.

The discussion was lively and caused any number of players to quit our league and never speak to each other again.

So it did have an upside.

Havelock Hewes said...

I think Bill is kidding. Just one factual error. Dobbs was never our primary field. We used it during the fall and winter maybe one-third of our games - still, enough to make a difference.

The Stats Lab said...

Bill, I did read that debate as it appears on Larry's website. I have an excellent number to measure hitting-- it's run elements created per out made. it's not perfect-- i use arbitrary values for walks, doubles, homers, etc, rather than actual ones. here's what it says about this year's mvp vote. alex created 0.554 runs per out, but only batted 232 times, for a season value of 128.47. derek created 0.529 runs per out, batting 249 times, for a season value of 131.70. i created 0.446 runs per out, but batted 305 times, for a season value of 135.96. that's pretty close-- derek and alex are more efficient with their at bats, but i am always there (missed one weekend due to my sister's wedding all year), and performing at a pretty high level as well.

now, i have a number for pitching. i decided that pitching is less important than hitting, because your fielders have something to do with your success as a pitcher, and there's no way of measuring fielding except by what our eyes observe on the field. that number is made by comparing a pitcher to the league average, and multiplying difference from league average by outs recorded. so a pitcher's score can be negative, and i think that makes sense, because a bad pitcher hurts his team's chance of winning. so i add 20.16 points to my total. derek adds 14.44 to his. alex remains unchanged.

by the way, so you can see i'm not completely blowing my own horn for MVP-- my stat shows i had no business winning it last year, and Carl got jobbed.

Havelock Hewes said...

Ian,
Do we have a database for actual hitting and pitching in 2010? I see league leaders, but no stats.

Bill M said...

As a young player, you need access to my minor league stats to make accurate predictions.

Including league difficulty factors. Huntington little league was way badder-ass than Northport little league.