NAME: Barry Bonds' 2001 Plate Appearances
TYPE: Census
SIZE: 648 observations, 15 variables
DESCRIPTIVE ABSTRACT:
Data are provided for Barry Bonds' plate appearances in the 2001
baseball season. Variables include characteristics of the innings
before the first pitch to Bonds (e.g., the number of outs, the number
of runners on each base, the score, the opposing pitcher's earned run
average) and after the first pitch to Bonds (e.g., the outcome of the
appearance, how many runs scored in the inning after Bonds hits).
SOURCES:
The data were obtained from CBS Sportsline at
http://www.cbs.sportsline.com. This site has pitch-by-pitch summaries
of every baseball game in 2001.
Pitchers' ERAs were obtained from ESPN at http://www.espn.com.
These data were analyzed in Reiter, J. P. (2002) "Should teams walk or pitch to
Barry Bonds?" By the Numbers: The Newsletter of the SABR Statistical
Analysis Committee, 12 (November 2002), pp. 7-11.
VARIABLE DESCRIPTIONS:
Each plate appearance is on a single line of a text file with line breaks.
Values are delimited by spaces.
Columns Description
1-3 Plate appearance number.
5-7 Number of the game in the season.
9 Number of the plate appearance within the game.
11 Equals one for games in San Francisco and equals zero otherwise.
13 Equals one when there is a runner on first base when
Bonds appears and equals zero otherwise.
15 Equals one when there is a runner on second base when
Bonds appears and equals zero otherwise.
17 Equals one when there is a runner on third base when
Bonds appears and equals zero otherwise.
19 Number of outs in inning when Bonds appears.
21-22 Inning of plate appearance.
24 Number of runs scored by Giants in the inning after first pitch to Bonds.
26 Equals one if Bonds walks and equals zero otherwise.
28 Equals one if Bonds walks intentionally and equals
zero otherwise.
30 Equals zero if Bonds does not reach base.
Equals one if Bonds reaches first base on a single or
error. Equals two if Bonds reaches second base on a double or
error. Equals three if Bonds reaches third base on a triple or
error. Equals four if Bonds hits a home run. Equals
five if Bonds walks or is hit by a pitch.
32-35 Opposing pitchers' career earned run average as of the
end of the 2000 season.
37-38 Giants score just before first pitch to Bonds
40-41 Opposing team's score just before first pitch to Bonds.
NOTES:
There are a few games for which data were not available, due to
invalid web links. There are two pitchers for whom I could not locate
their earned run averages. These missing data should not bias analyses, since
they are missing completely at random.
For rookie pitchers, I used their 2001 earned run average.
STORY BEHIND THE DATA:
Barry Bonds is probably the most well-known current baseball player.
His batting statistics in 2001 reflect arguably the greatest
individual season of all time. One common strategy employed by
opposing managers in 2001 was to walk Bonds rather than pitch to him.
This avoids the risk of Bonds hitting a home run, but it puts an
extra runner on base. Hence, we are confronted with an interesting
question in baseball strategy: does walking rather than pitching to
Bonds reduce the chance that the Giants will score runs? These data
were collected to examine this question. Analyses of the data suggest
that differences in the two strategies are small; we cannot rule out
the possibility that walking and pitching to Bonds are equally
effective.
PEDAGOGICAL NOTES:
The 2001 data are in fact a census, so that quantities for 2001 (e.g.,
average runs) are known exactly, except for errors due to the missing
data. Inferential methods like hypothesis tests and confidence
intervals are not needed to estimate 2001 quantities. However,
the 2001 quantities themselves are not of primary interest; rather, interest
centers on a hypothetical population of Barry Bonds' plate
appearances under conditions similar to those in the league in 2001.
We treat the data from 2001 as a random sample from such a
hypothetical population. This framework is commonly employed when
analyzing sociological data, such as country-wide or state-wide data.
Highlighting this conceptualization helps students understand the
differences between samples and censuses.
There are several outcome variables that can be used to compare
walking versus pitching to Bonds. One outcome is the percentage of
innings in which the Giants score at least one run. This is relevant for
situations in which the opposing manager does not want to give up any
runs. Another outcome is the number of runs the Giants score. Students
can manipulate the runs variable to examine either outcome.
Bonds is not randomly assigned to be walked or pitched to, so that the
data are observational rather than experimental. When comparing two
treatments (walk Bonds and pitch to Bonds) with observational data, it
is crucial to compare the distributions of causally-relevant
background characteristics in the two treatment groups. In these data,
game situation (number of outs and runners on base) and pitcher's ERA
are the two most important causally-relevant characteristics. To
control for game situation, students can split the data set by game
situations before comparing walk-innings versus hit-innings. For
example, innings in which Bonds appears with no one on and no outs
can be grouped together, and then outcomes when Bonds walks or hits
can be compared within those innings. Comparisons of ERA show that it
is similarly distributed in walk-innings and hit-innings in all
scenarios, so that it does not affect the causal conclusions.
SUBMITTED BY:
Jerome P. Reiter
Institute of Statistics and Decision Sciences
Duke University
Box 90251
Durham, NC 27708
jerry@stat.duke.edu