Monday, February 22, 2016

Baseball Forecasting

As spring training is about to begin, people begin to put out forecasts. How are these forecasts put together? There are three main methods.

One method is the wisdom of the crowds. Fangraphs FAN projections is the purest instance of this method. The method is to have many people estimate something and average the estimates together. The idea is that, while people are going to be wrong, the way they're wrong will form a bell curve. So, by getting many estimates, we can get an estimate close to the real number.

A second method is expert opinion. You get together a group of people who really know baseball and have them come to a conclusion. This is how Sports Illustrated does it.

The final method an most common forecast is statistical projection. Stats can be projected taking into account past performance, the reliability of the data, and the effect of aging on the player.

As an example, I'll forecast Miguel Cabrera's slugging percentage for 2016. The system I'll be using is the Marcels system, which is the simplest system I know of that will give reasonable results.

First, I need to find Cabrera's runs for the past three years.

2015 - .534
2014 - .524
2013 - .636

Next, we take a weighted average. We assign last year's score a weight of 5, the year before a weight of 4, and the year before a weight of 3.

wSLG = (5 * SLG15 + 4 * SLG14 + 3 * SLG13)/12

(5 * .534 + 4 * .524 + 3 * .636)/12 = .556

Next, we take a look at his plate appearances for those years to determine how reliable the information is. Those appearances are once again weighed.

(5 * PA15 + 4 * PA14 + 3 * PA13)/(5 * PA15 + 4 * PA14 + 3 * PA13 +1200)

(5 * 511 + 4 * 685 + 3 * 652)/(5 * 511 + 4 * 685 + 3 * 652+1200) = 0.858

Next, we need a weighed league average:

wLgSLG = (5 * LgSLG15 + 4 * LgSLG14 + 3 * LgSLG13)/12

(5*.405+4*.386+3*.396)/12 = .396

Now, we factor together the reliability rating and weighted average and add it to the factor of the league average and 1 - reliability rate:

adjSLG = (wSLG * r) + (wLgSLG * (1 - r))

(.556 * .858) + ((1 - .858) * .396) = .533

Next, we need to find the factor to account for his age.

ageFactor = 1 - ((age - 29) * .003)

1 - ((32 - 29) * .003) = .991

Finally we multiply are adjusted number and our age factor:

final = adjSLG * ageFactor

.991 * .533 = .529

There's a few things to note. First, the formula for age factor given above is only for people above the age 30. For younger players, use the following instead:

ageFactor = 1 + ((29 - age) * .006)

The other problem is the system is designed to use three years of stats. For those without, use the league average for the stat being examined. However, don't do the same with plate appearances.

Clearly this system could be improved. Rookies could be evaluated using minor league equivalencies and park factors instead of the league average. The weights given to the various years could probably be fine tuned. The same is probably true of the formula for age factor. In reality, it'd probably do better to have a table of factors.

No comments:

Post a Comment