by Clifford Blau
Numerous analysts have developed formulas to predict how many runs will result from any given combination of singles, doubles, walks, outs, etc. These formulas are typically verified using seasonal team data, and most of them are very accurate by that standard. However, they are normally used to estimate how many runs result from the production of an individual batter. The main problem with that is that, since there is no way of determining exactly how many runs an individual is responsible for, the accuracy of the formulas for this purpose is difficult to validate. A related problem, highlighted by Phil Birnbaum in the May 1999 issue of By The Numbers, is that the range of offense represented by teams is much smaller than the difference in production by individuals. Therefore, a formula may work very well for a player of average performance, but not for a very good or very bad hitter. He developed a formula that works well over a wider range of player performance.
Some years ago, I did a small study to test the accuracy of runs formulas and, with the introduction of some new methods recently, I decided to revive and expand that study. I applied the formulas to pitchers' statistics. The advantage in this is that we know not only how many singles, walks, homers, etc. a pitcher allows, but also how many runs. Additionally, a pitcher's statistics offer a data set of approximately the same size as a batter's for a season. In theory, the typical error in the estimated runs will be proportionately greater for an individual than for a team, as luck will have fewer opportunities to even out. (While the runs formulas work well for team-seasons, they will not work well for individual games. This is because they assume an average number of uncounted events such as reaching on errors and lost baserunners. They also assume that counted events have a normal distribution, but even for team-seasons there will be some variance from the norm. For smaller samples such as an individual season or team-game, the error will be greater.) A possible drawback with this approach is that the number of runs charged to a pitcher may not reflect the runs deserved, due to the effect of partial innings and relief pitchers. Also, the normal range of performance is narrower for pitchers than for hitters. Few pitchers are good enough to allow under three runs per nine innings for a season, while those bad enough to yield more than seven don't usually pitch long.
I had originally used data from the 1984 and 1985 seasons, and expanded the study to encompass the 1986 and 1987 seasons, since I had data for those years. Fortunately, 1987 was a high offense year, so there were several pitchers who allowed runs at a high rate. The formulas evaluated were Bill James' Runs Created (RC)(Technical Version from the 1988 Abstract), Paul Johnson's Estimated Runs Produced (ERP), Extrapolated Runs (XR)(introduced in the 1999 Big Bad Baseball Annual), and Phil Birnbaum's Ugly Weights (UW). These formulas are detailed in the appendix. Since I was lacking double play data, I had to use a simplified version of XR, called Extrapolated Runs Reduced (XRR). Results are presented using the standard deviation of the difference between the predicted number of runs and the actual runs, the mean of that difference, and the square of the correlation produced by a linear regression equation (coefficient of determination.)
I selected 119 pitchers, representing a wide range of performance.
These
pitchers worked an average of 164.4 innings, with a range of 34.3 to
271.7.
They allowed a mean of 80.3 runs. The results of this test are shown in
Table 1:
| Standard Deviation | Mean Error | Coefficient of Determination | |
| RC | 9.19 | 5.2 | .968 |
| ERP | 7.7 | 0.5 | .966 |
| XRR | 7.80 | -0.8 | .967 |
| UW | 8.15 | -1.9 | .968 |
In order to test whether Ugly Weights outperformed the others for
very
good and very bad pitchers, I divided the sample into four nearly
equal-sized
groups, based on OPS. The very good group had these results, based on
29
pitchers with a mean of 193.9 innings and 57.6 runs allowed:
| Standard Deviation | Mean Error | Coefficient of Determination | |
| RC | 8.70 | 5.1 | .949 |
| ERP | 7.45 | 1.9 | .949 |
| XRR | 7.37 | 0.5 | .949 |
| UW | 6.78 | 1.6 | .957 |
And the thirty pitchers in the very bad group, with an average of 94.5 innings pitched and 69.6 runs allowed:
| Standard Deviation | Mean Error | Coefficient of Determination | |
| RC | 8.05 | 5.3 | .959 |
| ERP | 7.77 | -1.2 | .963 |
| XRR | 7.80 | -2.2 | .966 |
| UW | 6.41 | -3.2 | .967 |
Then I hit upon another means of testing these formulas. I went
through
the major league box scores for July and August of 1993, and compiled
the
data for high and low offense games into groups of about eighteen games
each, to approximate a full-time player. After making nine of these
groups
for both good and bad offenses, I applied the formulas (using XR
instead
of XRR this time), and got the following:
| Team | AB | BA | OBA | SA | Runs | ERP | RC | UW | XR |
| 1 | 711 | .353 | .430 | .592 | 177 | 161 | 184 | 163 | 160 |
| 2 | 671 | .341 | .419 | .559 | 145 | 143 | 158 | 146 | 143 |
| 3 | 663 | .299 | .375 | .507 | 139 | 120 | 126 | 117 | 121 |
| 4 | 652 | .314 | .393 | .472 | 124 | 118 | 124 | 118 | 117 |
| 5 | 657 | .333 | .422 | .516 | 134 | 133 | 144 | 133 | 133 |
| 6 | 669 | .344 | .419 | .538 | 136 | 136 | 150 | 137 | 137 |
| 7 | 673 | .330 | .397 | .612 | 146 | 148 | 162 | 142 | 148 |
| 8 | 656 | .316 | .396 | .543 | 138 | 132 | 142 | 128 | 130 |
| 9 | 671 | .334 | .389 | .548 | 138 | 129 | 136 | 127 | 130 |
| Standard Deviation of differences | 9.3 | 10.3 | 10.3 | 9.4 |
| Coefficient of determination | .76 | .77 | .74 | .75 |
| Team | AB | BA | OBA | SA | Runs | ERP | RC | UW | XR |
| 10 | 574 | .202 | .260 | .282 | 43 | 40 | 41 | 42 | 40 |
| 11 | 586 | .212 | .274 | .285 | 37 | 44 | 44 | 45 | 45 |
| 12 | 572 | .187 | .250 | .260 | 36 | 34 | 37 | 34 | 34 |
| 13 | 578 | .220 | .282 | .282 | 40 | 45 | 46 | 43 | 46 |
| 14 | 579 | .197 | .230 | .282 | 35 | 33 | 36 | 35 | 31 |
| 15 | 577 | .217 | .267 | .267 | 31 | 37 | 39 | 34 | 37 |
| 16 | 587 | .233 | .303 | .334 | 49 | 63 | 61 | 60 | 61 |
| 17 | 579 | .207 | .271 | .285 | 38 | 43 | 42 | 44 | 44 |
| 18 | 578 | .218 | .270 | .334 | 43 | 51 | 50 | 52 | 52 |
| Standard Deviation of differences | 6.77 | 6.4 | 6.0 | 6.9 |
| Coefficient of determination | .71 | .72 | .79 | .68 |
| Overall standard deviations were: | 8.13 | 8.6 | 8.4 | 8.3 |
All of the formulas tended to overpredict runs for the weak hitting teams and all but RC underpredicted for the high scorers. A possible problem with this portion of the study is that the groups may be biased due to being selected for the output (runs) rather than the input (hits, walks, etc.) However, I tried to select games based on the input, so hopefully this does not distort the results.
Conclusions:
This study has presented two methods of validating runs formulas for individuals. Using both pitchers' opponents batting statistics and groups of team game statistics, I found that Estimated Runs Produced performed best overall, although the differences among the four formulas were fairly small. All of the formulas correlate well with actual runs. However, the typical error in the estimates appears to be roughly 10%, which should be kept in mind when using them to evaluate hitters. It doesn't make sense to produce an estimate of runs created to the nearest .01 of a run when the formula is only accurate to the nearest 10 runs. Based on this study, Ugly Weights may be more useful for very good and very bad hitters. In the second part of the study, Runs Created, the only non-linear formula of the four, was much more accurate in some cases, and much less in others, than the other three. Perhaps an examination of why that is so could lead to a more accurate formula. However, in order to achieve a significant increase in accuracy, data on baserunning, errors, and timely hitting would likely be necessary.
Appendix:
Runs Created: (H+W+HBP-CS-GDP)*(TB+(.26(BB-IBB+HBP)+.52(SH+SF+SB))/ AB+BB+HBP+SH+SF)
Estimated Runs Produced: (2*(TB+BB+HP)+H+SB-(.605X(AB+CS+GIDP-H)))*.16
Extrapolated Runs: .50(1B) + .72(2B) + 1.04(3B) + 1.44(HR) + .34(BB+HP-IBB)+.25(IBB) + .18(SB) - .32(CS)-.09(AB-H-K)-.098(K)-.37(GIDP)+.37(SF)+.04(SH)
Ugly Weights: .46(1B) + .80(2B) + 1.02(3B) + 1.4(HR) + .33(BB) + .3(SB) - .5(CS) - [ .687*ba -1.188*ba2 + .152*ip2- 1.288*iw*ba - .049*ba*ip + .271*ba*ip*iw + .459*iw - .552*iw2 - .018]* (outs)
where ip=Isolated Power (Slugging Average minus Batting Average) and iw=walks divided by at-bats
Thanks to Cyril Morong and Phil Birnbaum for their comments on earlier versions of this article.
Email: CliffordBlau@yahoo.com