Dan Nettleton
University of Nebraska-Lincoln
Journal of Statistics Education v.6, n.2 (1998)
Copyright (c) 1998 by Dan Nettleton, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.
Key Words: Bivariate data; Confidence region; Paired comparisons; Scatterplot; Simultaneous confidence intervals.
Scores of 1997 Big Ten Conference men's basketball games involving the University of Iowa Hawkeyes are analyzed with a series of scatterplots accompanied by formal bivariate statistical inference. The analyses reveal that the Hawkeyes' defensive performance is largely unaffected by the site of the game, while offensive performance dips significantly in games played on opposing teams' courts.
1 Most students are familiar with the concept of home court advantage in college basketball. From small junior colleges to large universities, basketball teams tend to boast a greater winning percentage in games played on their home floor than in games hosted by the opposition. Since basketball can be considered in two basic phases, offense and defense, it is natural to ask how sensitive these phases are to game location. Such information may prove useful to players and coaches as they prepare for the upcoming season. We attempt to answer this question for one team based on their performance during one season of conference play.
2 The analyses presented can be conducted on any team's scores as long as they play several teams in a home and away format. This type of data should be readily available at most colleges and universities belonging to a conference. Students are likely to find the data and analyses interesting, especially if they feel they have a hand in creating the home court advantage. I do not know if conclusions drawn for the Iowa dataset are typical or an exception. It is certainly not the case that all schools have constant defensive effectiveness but improved offensive performance at home.
3 The University of Iowa Hawkeyes' 1997 Big Ten Conference season serves as a fine example of the home court advantage phenomenon. The Hawks were beaten only once in nine games played in Iowa City's Carver Hawkeye Arena. Their record on opposing teams' courts was much less impressive -- only four road wins to go with five road losses. The site, opponent, and score of each of these 18 games are contained in Table 1.
HOME Points Scored by ... |
AWAY Points Scored by ... |
|||
---|---|---|---|---|
Opponent | Iowa | Opponent | Iowa | Opponent |
Illinois | 82 | 65 | 51 | 66 |
Indiana | 75 | 67 | - | - |
Michigan | 80 | 75 | 71 | 79 |
Michigan State | - | - | 67 | 69 |
Minnesota | 66 | 68 | 51 | 66 |
Northwestern | 72 | 55 | 75 | 59 |
Ohio State | 76 | 62 | 69 | 56 |
Penn State | 81 | 55 | 69 | 57 |
Purdue | 84 | 62 | 59 | 56 |
Wisconsin | 78 | 53 | 48 | 49 |
4 As most collegiate sports fans know, the Big Ten actually contains eleven teams. To allow sufficient flexibility for scheduling non-conference opponents in 1997, each Big Ten team played a complete two-game home-and-away series with only eight Big Ten opponents. The other two opponents were played only once during the season. Because the Hawks played Indiana and Michigan State each one time in 1997, the scores of the away game against Indiana and the home game with Michigan State are unobserved data in Table 1.
5 While the analyses presented in the next section are most appropriate for a standard applied multivariate statistics course, the dataset is suitable for an elementary statistics course as well. By focusing on either offensive or defensive performance alone, the data can be used to illustrate the univariate paired t-test with corresponding confidence interval and/or the sign test. A question like "Does game location (home or away) affect offensive performance?" captures student interest. With some prompting, students are quick to point out the need to control for opponent in the analyses. Hence, pairing the data is perceived as a natural course of action.
6 In an applied multivariate course, I have asked the following somewhat open-ended question.
Dr. Tom Davis, University of Iowa men's basketball coach, is interested in knowing about differences in how his team performed at their home gym compared to on the road in Big Ten Conference games. (The phrase "on the road" refers to games played at the opposing school's gymnasium.) Specifically, he would like to know if his team's ability to score points and prevent points from being scored by the other team varies according to whether the game is played at the University of Iowa. The table below contains the final scores of sixteen Big Ten basketball games played by the University of Iowa this season. Note that the sixteen games consist of a home game and a road game with each of eight opponents. Analyze these data for Coach Davis. Include appropriate tests, confidence regions or intervals, graphs, etc. to support your conclusions. Be sure to provide a summary in terms the coach can understand (his Ph.D. is in history).
7 The point of the final sentence is not to disparage historians, basketball coaches, or any person, but rather, to encourage students to communicate their conclusions intelligibly to people who might not speak the technical language of statistics. Note that the question does not specify any particular test, confidence interval, or significance level. While this general phrasing makes grading more difficult, it adds realism by placing the student in the consultant's role. I supply the students with the scores of only sixteen games; the games with Indiana and Michigan State are excluded. The existence of these two scores and their utility are discussed in class.
8 The scatterplot in Figure 1 provides a nice pictorial summary of the information contained in Table 1. Each point can be labeled by opponent for a more complete (although more cluttered) view of the data. Figure 1 clearly illustrates two facts about the Hawkeyes' 1997 Big Ten season. First, they had a fairly good season, since most points fall above the reference line. Second, their performance at home was generally superior to their performance on the road, since the points corresponding to home games have a greater tendency to fall above the reference line than the points corresponding to away games.
Figure 1. Iowa's Big Ten Games.
9 To determine impact of game location on the two aspects of the Hawks' performance, we use the natural bottom-line measure of offensive and defensive effectiveness at our disposal, i.e., points scored and points allowed, respectively. These two measures taken on any one game are dependent, since some games are very high-scoring affairs where both points scored and points allowed are likely to be high, while others are defensive battles in which neither team scores many points. In addition, the two measures may depend heavily on the opposing team's skill level and/or style of play. Thus, it is important to consider the bivariate nature of the data and to control for varying opponents in the analyses to follow.
10 Figure 2 and Figure 3 are scatterplots of points scored at home against points scored on the road and points allowed at home versus points allowed on the road, respectively. In both plots, the data are paired according to opponent. Because of the unobserved data, the Indiana and Michigan State scores are excluded from these plots and from the subsequent analyses.
Figure 2. Offensive Performance at Home Versus Away.
Figure 3. Defensive Performance at Home Versus Away.
11 These figures clearly suggest an answer to the question of interest. In Figure 3, most points fall quite near the reference line, suggesting that the points allowed are nearly constant for a given opponent. In contrast, the points of Figure 2 tend to fall above the reference line, indicating greater offensive production for the Hawks when playing at home.
12 The conclusions suggested by the exploratory analysis above
can be confirmed formally using multivariate statistical
techniques. For each opponent, consider the two-dimensional
difference vector whose first and second components are
points scored by Iowa at home less points scored by Iowa
away and points allowed by Iowa at home less points allowed
by Iowa away, respectively. These eight vectors can be
considered a simple random sample from some distribution
with unknown mean
and variance-covariance matrix
. A value of
in the
fourth quadrant would suggest that both Iowa's offense and
defense are more effective in home games.
13 Assuming that the underlying distribution is bivariate
normal, the techniques outlined in Section 6.2 of
Johnson and Wichern (1992) can be used to construct a 95% confidence
region for and/or
simultaneous 95% confidence
intervals for
and
.
The sample mean and
variance-covariance matrix of the eight vectors are
The point estimate of ,
, suggests that playing
at home benefits Iowa's offense an average of nearly 16
points while, perhaps, reducing defensive effectiveness
slightly (around a single point on average). A 95%
confidence region for
is given by the set of points
(x,y) satisfying
where 5.14325 is the 0.95 quantile of an F-distribution with 2 and 6 degrees of freedom.
14 This region, outlined in Figure 4, is the solid ellipse
centered at (15.750, 0.875) with minor axis of length 9.13
and major axis of length 29.79, lying along the line y =
0.167 x - 1.755. The position of the confidence ellipse
indicates that is
positive while
may
very well be zero. This confirms the message of Figures 2
and 3; i.e., the Hawks' home offensive performance is
generally superior to their road performance, while
defensive effectiveness remains fairly constant.
Figure 4. A 95% Confidence Ellipse.
15 Bonferroni simultaneous confidence intervals tell a similar
story. We can be 95% confident that both
(3.69, 27.81) and
(-3.33, 5.08). Hence, the
true mean offensive benefit to playing at home is somewhere
between 4 and 28 points, roughly speaking. According to the
data, it is feasible that the impact on the defense is
neutral. It is interesting to note that the simultaneous
confidence intervals do not rule out the possibility
, which would
contradict the notion of an Iowa home court advantage. However,
x > y for all points
in the 95% confidence region for
, supporting the
impression of home court advantage conveyed by the data.
16 To validate the analysis, the assumption of bivariate normality should be verified. Using the Q-Q plot correlation coefficient test for normality described in Section 4.6 of Johnson and Wichern (1992), the hypothesis of univariate normality cannot be rejected at the 0.10 level of significance for either of the variables considered marginally. However, with only eight points, all but severe violations of normality are likely to go undetected. In addition, marginal normality does not guarantee the bivariate normality needed for the techniques above. Methods for assessing bivariate normality directly suffer from the same lack of power problematic in the univariate case. Although the basic conclusions of the analysis are not in doubt, the dataset could be used as motivation for multivariate nonparametric techniques in an advanced course.
17 Most students have an interest in sports, either as a participant, spectator, or both. Among collegiate sports, basketball in certainly one of the most popular. Hence, many students are likely to find basketball score data appealing, especially if that data can be used to answer an interesting question about a team with which they are familiar.
18 The file hawks.dat.txt contains the raw data. The file hawks.txt is a documentation file containing a brief description of the dataset.
Columns 1 - 14 Iowa's opponent 17 Site of the game (H and A stand for home (Iowa City) and away) 19 - 20 Points scored by Iowa 22 - 23 Points scored by Iowa's opponent (points allowed by Iowa)Values are aligned and delimited by blanks.
Johnson, R. A., and Wichern, D. W. (1992), Applied Multivariate Statistical Analysis (3rd ed.), New York: Prentice Hall.
Dan Nettleton
924 Oldfather Hall
Department of Mathematics and Statistics
University of Nebraska-Lincoln
Lincoln, Nebraska 68588-0323