NFL Y2K PCA

Mitchell Watnik and Richard A. Levine
University of California

Journal of Statistics Education Volume 9, Number 3 (2001)

Copyright © 2001 by Mitchell Watnik and Richard A. Levine, all rights reserved.
This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.

Key Words: Multivariate analysis; National Football League (NFL); Summary ranking measures.

Abstract

The dataset associated with this paper is from the 2000 regular season of the National Football League (NFL). We use principal components techniques to evaluate team "strength." In some of our analyses, the first two principal components can be interpreted as measure of "offensive" and "defensive" strengths, respectively. In other circumstances, the first principal component compares a team against its opponents.

1. Introduction

Our dataset is from the National Football League (NFL), but our work did not begin that way. We were interested in discussing the football team at our workplace, the University of California at Davis. The Aggies are an excellent football team, regularly win their league, and have been nationally ranked within NCAA Division II for a few years. In the 2000 season, the team made it to the semifinal game of the Division II championship series.

Just prior to that game, a local newspaper indicated that the Aggies' much-maligned defense was actually ranked in the top 20 in Division II. We did not believe that to be true. While we may be fair-weather fans of the Aggies, we felt that they were somewhat comparable to the 2000 version of the NFL St. Louis Rams -- all offense, no defense.

We decided to come up with a better way of ranking defenses, even going so far as to name the article before it was written ("The Best Defense is a Great Offense? Taking the Quarterback Out of Defense Rankings"). Our idea was that the amount of time that the defense is on the field is not typically accounted for in ranking defenses. A defense that is typically on the field for 25 minutes per 60-minute game is probably going to give up fewer points than a defense that is on the field for 30 minutes per game. Our first thought was to compute rates over time for touchdowns, yardage gained, and other accumulated statistics. A ranking of offenses could effectively use the same scoring technique as that for defenses (where a high score indicates a poor defense). We note that rates have been used previously for improving summary measures in sports statistics. For example, Anderson-Cook, Thornton, and Robles (1997) suggest a beautiful use of rates for evaluating power-play efficiency in hockey.

The first issue, of course, was to acquire data, but getting what we felt to be "necessary" information about Division II football teams is a formidable task. We therefore set out to rank NFL teams since the data are much more readily available.

The next problem is obvious: there is a lot of statistical information available for the taking. What is important and what is not important in ranking offenses and defenses is anyone's guess. Nonetheless, just as is the case with the title of this paper, a great deal of information can be effectively summarized using well-known dimension reduction techniques. We therefore employed the usual statistical methodology for when one has numerous variables, but a relatively small number of observations -- principal components (see Johnson and Wichern 1998 for an introduction). As noted in some detail in the sequel, the technique worked in almost textbook fashion.

2. The Dataset

These data consist of information from the 2000 regular season (not including playoffs) of the NFL. Most of the information was obtained via the NFL web site www.nfl.com, though some, particularly the information pertaining to starting field position, was obtained from www.foxsports.com. Any "rate" variable has the average time of possession times the number of games as the denominator. The variables in the dataset are the number of touchdowns (touch), total offensive yards (yards), time of possession (top), rate of touchdowns (ratetd), number of sacks (sacks), rate of yards (rateyds), number of drives beginning in the "red zone" (drives20), number of drives beginning in "opponents' territory" (drives50), field goals attempted (fga), field goals made (fgm), number of punts (puntno), gross punt average (puntave), net punt average (puntnet), number of punts going for touchbacks (punttb), number of punts placed within the 20 yard line (punt20), longest punt return (puntlong), punt rate (puntrate), number of punts blocked (puntblock), number of first downs (1sts), number of kickoffs (kos), amount of return yardage on the kickoff (koyds), average length of kickoff returns (koave), number of kickoffs returned for a touchdown (kotds), number of punts returned (rets), number of punts "fair caught" (fc), amount of punt return yardage (retyds), average length of punt returns (retave), number of punts returned for a touchdown (rettds), number of interceptions (int), and number of fumble recoveries (recover). Each of these pieces of information applies to both the team of interest and their opponents -- the former will be prefixed by "home" and the latter will be prefixed by "opp." We also have each team's wins and losses.

3. Our Initial Analysis

Although we compiled this dataset, we have no doubt that ours will not be the final word on its analysis. Indeed, our hope is that students will come up with novel and statistically sound ways of summarizing and analyzing this NFL data.

We used SAS® "proc princomp" to perform the principal components analysis on the raw explanatory information, and, as will be seen, we tried various configurations of variables.

Our first attempt at the analysis involved only a few variables, because, at the time, these variables were the only ones available. Furthermore, we had information only for the American Football Conference (AFC; about half of the teams in the NFL). The first two principal components, given in Table 1 explain almost 82% of the variation. The corresponding biplot (see, for example, Section 12.7 of Johnson and Wichern 1998 or Venables and Ripley 1997, pp. 388-389) from S-Plus® is given in Figure 1, where the abbreviations for the variables used in the figure are given in Table 1 and those for the teams are given in Table 2; teams that made the playoffs are indicated by asterisks in the tables and figures. In order to avoid some confusion, we note that on Figure 1, there is a counterintuitive correspondence between the points and the graph labels. The lighter axes (those on the upper and right-hand parts of the plot) correspond to the darker points (the team names), and vice-versa (see Venables and Ripley 1997, p. 388).

Table 1. First Two Principal Components Using the American Football Conference (AFC) Data.

Variable First PC Second PC

hometop (htop) 0.289154 -0.268395
hometouch (htd) 0.393211 0.103344
opptouch (otd) -0.099851 0.417011
homeyards (hyd) 0.395883 0.054983
oppyards (oyd) -0.056191 0.410637
homeratetd (hrtd) 0.362950 0.177276
oppratetd (ortd) -0.002532 0.409282
homerateyds (hryds) 0.302423 0.233419
opprateyds (oryds) 0.190472 0.320558
oppdrives20 (odriv20) -0.239607 0.231254
oppdrives50 (odriv50) -0.321779 0.138639
home1sts (h1sts) 0.397584 0.070433
opp1sts (o1sts) -0.115843 0.371349

Table 2. Team Rankings Based on the Difference in the First Two Principal Components fromTable 1 for All Teams in the National Football League. Asterisks (*) denote teams that made the NFL 2000 playoffs.

Overall Rank Team Conf. Wins Overall Score Offensive Score Off. Rank Defensive Score Def. Rank

1 TEN * AFC 13 5.38075 1.26479 8 -4.11596 2
2 BAL * AFC 12 4.66794 0.31373 12 -4.35421 1
3 WAS NFC 8 2.14909 0.25058 15 -1.89851 5
4 OAK * AFC 12 1.86825 2.92732 3 1.05907 21
5 MIA * AFC 11 1.84852 -0.67641 21 -2.52493 3
6 NYG * NFC 12 1.75330 0.67552 11 -1.07779 7
7 DEN * AFC 11 1.55059 4.16786 2 2.61728 28
8 PIT AFC 9 1.51409 -0.42227 19 -1.93636 4
9 NO * NFC 10 1.23657 1.05029 10 -0.18628 12
10 PHI * NFC 11 1.19001 0.02921 17 -1.16080 6
11 IND * AFC 10 1.18673 2.72832 4 1.54159 23
12 BUF AFC 8 1.02534 0.32804 13 -0.69730 10
13 JAX AFC 7 0.89662 1.07693 9 0.18031 13
14 NYJ AFC 9 0.86557 0.23653 16 -0.62903 11
15 TB * NFC 10 0.60569 -0.29121 18 -0.89690 9
16 STL * NFC 10 0.27546 4.78642 1 4.51096 31
17 GB NFC 9 -0.03321 0.31085 14 0.34406 15
18 DET NFC 9 -0.22774 -1.15149 23 -0.92375 8
19 KC AFC 7 -0.37763 1.31450 7 1.69212 25
20 MIN * NFC 11 -0.54273 1.94430 6 2.48703 27
21 SF NFC 6 -0.88587 2.33624 5 3.22212 30
22 CAR NFC 7 -1.15463 -0.59112 20 0.56351 17
23 NE AFC 5 -1.78054 -1.29337 24 0.48717 16
24 DAL NFC 5 -2.11107 -1.46166 25 0.64941 18
25 CHI NFC 5 -2.59514 -2.26068 27 0.33447 14
26 ATL NFC 4 -2.98308 -2.10208 26 0.88099 20
27 SEA AFC 6 -3.74492 -0.68732 22 3.05760 29
28 CIN AFC 4 -4.15162 -3.33577 30 0.81585 19
29 SD AFC 1 -4.65210 -3.03908 29 1.61302 24
30 ARI NFC 3 -4.84552 -2.47209 28 2.37343 26
31 CLE AFC 3 -6.09758 -4.90381 31 1.19377 22

Figure 1.

Figure 1. Biplot of PC Values inTable 1.

We interpret the first principal component as an "offensive score," summarizing a team's offensive capabilities. The second principal component may be interpreted as a "defensive score," summarizing a team's defensive capabilities. In the case of the first principal component, a large positive score indicates a good offensive team (indicated by being further to the right in Figure 1); in the second, a large negative score indicates a good defensive team (indicated by being closer to the bottom in Figure 1). We regressed these two principal components on team win percentage; the marginal regressions are depicted in Figures 2a and 2b. The R² was 83%. We also found that the regression coefficients were close to equal, though of opposite signs. In fact, a hypothesis test -- see, for example, Samaniego and Watnik (1997) -- established that the difference between the two principal components, labeled "overall score" in Table 2 and Figure 2c, showed no significant difference between the model with just the overall score and the two separate scores.

Figure 2a.

Figure 2a. Plot of Offensive Scores Against Wins.

Figure 2b.

Figure 2b. Plot of Defensive Scores Against Wins.

Figure 2c.

Figure 2c. Plot of Overall Scores Against Wins.

Table 2 presents the offensive, defensive, and overall scores (as defined in the previous paragraph), using the first two principal components of Table 1, for every NFL team. The National Football Conference (NFC) teams in Table 2 provide a kind of "validation" of the AFC model. Six of the top seven AFC teams, according to this scoring criterion, made the playoffs. While the best NFC team according to this measure, the Washington Redskins, did not make the playoffs, the next five NFC teams did. (This could be taken as an indication that the team's win-loss record was not up to the teams' performance and thus an explanation for the Redskins' late-season firing of their coach.) As the reader might notice throughout, the Minnesota Vikings fared poorly in almost every model while the Washington Redskins tended to be overrated by the models. Interestingly, though not surprisingly, the St. Louis Rams had the NFL's best offense and the worst defense according to our model. Three playoff-qualifying AFC teams, the Oakland Raiders, Denver Broncos, and Indianapolis Colts, had a similar, but not as dramatic, imbalance.

Our attempt at principal components for the above variables using all of the NFL teams was a success. However, lest students think that principal component analyses on any subset would work, our attempt using just the NFC teams was not successful. That is, the principal component analysis of NFC data was not amenable to the clear offensive and defensive interpretation as the analysis of the AFC data. We were obviously fortunate to have chosen the AFC as the conference whose data would be entered first.

We were also fortunate to have the principal components come out in such a desirable (for us) way. As the reader will see shortly, when all (or most) of the variables are included in the analysis, the principal components method tends to look directly at the difference between the "home" and "opp" measures. This leads us to believe that the imbalance in the variables in this model is what caused these interpretable components.

4. Analysis of the Full Dataset

Of course, principal components can handle a much larger number of variables. There is no reason for us not to use every variable at our disposal. For the AFC only, the first principal component, contained in Table 3, explained only 29% of the variation. Nonetheless, this principal component, in our opinion, is a direct measurement of the team against its opponents. Namely, this principal component almost always subtracts the contribution of the opposing team from the corresponding contribution of its team for the offense and vice-versa for the defense.

Table 3. First Principal Components for the AFC Data Using All Variables.

Variable First PC Variable First PC

hometop 0.218719
hometouch 0.134745 opptouch -0.185692
homeyards 0.136346 oppyards -0.149335
homeratetd 0.095218 oppratetd -0.141504
homerateyds 0.029558 opprateyds -0.006153
home1sts 0.139670 opp1sts -0.163813
homesacks 0.093923 oppsacks -0.092164
homeint 0.133793 oppint -0.084340
homerecover 0.057836 opprecover 0.014744
homekos -0.227016 oppkos 0.193279
homekoyds -0.176723 oppkoyds 0.154004
homekoave 0.137286 oppkoave -0.110405
homekotds -0.014834 oppkotds -0.017435
homedrives20 0.148039 oppdrives20 -0.175901
homedrives50 0.168546 oppdrives50 -0.197861
homefga 0.199397 oppfga -0.203328
homefgm 0.203278 oppfgm -0.197710
homepuntno -0.135343 opppuntno 0.110739
homepuntrate -0.190366 opppuntrate 0.173589
homepuntave -0.077663 opppuntave 0.101796
homepuntnet 0.021132 opppuntnet -0.051215
homepunt20 0.113390 opppunt20 -0.020490
homerettds 0.115540 opprettds -0.031943
homeretyds 0.165704 oppretyds -0.158089
homefc -0.024124 oppfc 0.084470
homerets 0.127991 opprets -0.162967
homepunttb 0.025041 opppunttb 0.054527
homeretave 0.124335 oppretave -0.082901
homepuntlong 0.003435 opppuntlong -0.051890
homepuntblock 0.024799 opppuntblock 0.051452

Using just that principal component, the regression on winning percentage for AFC teams provided an R² of 72%. In Table 4 and Figure 3, we show how the principal components matched with the teams' number of wins. Again, we used the NFC as a "validation" group. The top five AFC teams, according to this criterion, made the playoffs. Five of the top six NFC teams made the playoffs.

Table 4. Team Rankings Based on the First Principal Component from Table 3 for All Teams in the NFL. Asterisks (*) Denote Teams that Made the NFL 2000 Playoffs.

Rank Team Conference First PC score Wins

1 BAL * AFC 6.92874 12
2 TEN * AFC 6.62203 13
3 OAK * AFC 3.19638 12
4 DEN * AFC 3.05194 11
5 MIA * AFC 2.44269 11
6 GB NFC 1.56921 9
7 STL * NFC 1.50917 10
8 PIT AFC 1.38003 9
9 JAX AFC 1.30208 7
10 NYG * NFC 1.26682 12
11 TB * NFC 1.14953 10
12 IND * AFC 0.96163 10
13 PHI * NFC 0.49389 11
14 NO * NFC 0.40795 10
15 WAS NFC 0.22149 8
16 DET NFC -0.05276 9
17 CAR NFC -0.36193 7
18 NE AFC -0.47124 5
19 BUF AFC -1.19575 8
20 KC AFC -1.35185 7
21 NYJ AFC -1.48123 9
22 SF NFC -1.48244 6
23 DAL NFC -1.65470 5
24 MIN * NFC -1.69589 11
25 CHI NFC -2.40573 5
26 ATL NFC -2.58762 4
27 SEA AFC -2.78394 6
28 CIN AFC -5.20318 4
29 SD AFC -5.26384 1
30 ARI NFC -5.41074 3
31 CLE AFC -8.13448 3

Figure 3.

Figure 3. Principal Component Scores Against Wins.

The first principal component, given in Table 5 for the entire dataset (including all of the variables and all of the teams) explained only 21% of the variation. Again, as in Table 3, it appears to compare the team to its opponents directly.

Table 5.First Principal Components Using All Variables and All NFL Teams.

Variable First PC Variable First PC

hometop 0.255711
hometouch 0.135058 opptouch -0.201184
homeyards 0.131536 oppyards -0.196629
homeratetd 0.095602 oppratetd -0.153265
homerateyds 0.024881 opprateyds -0.043073
home1sts 0.146149 opp1sts -0.205652
homesacks 0.136492 oppsacks -0.073768
homeint 0.169915 oppint -0.106793
homerecover 0.072983 opprecover 0.003678
homekos -0.237560 oppkos 0.215731
homekoyds -0.192624 oppkoyds 0.189581
homekoave 0.058840 oppkoave -0.009941
homekotds -0.043506 oppkotds -0.014173
homedrives20 0.151142 oppdrives20 -0.176760
homedrives50 0.188694 oppdrives50 -0.176526
homeFGa 0.135393 oppfga -0.201820
homeFGM 0.194181 oppfgm -0.181222
homepuntno -0.051007 opppuntno 0.153017
homepuntrate -0.130411 opppuntrate 0.212593
homepuntave -0.058642 opppuntave 0.104526
homepuntnet -0.015789 opppuntnet -0.021425
homepunt20 0.120944 opppunt20 0.029792
homerettds 0.082453 opprettds -0.018587
homeretyds 0.178650 oppretyds -0.165508
homefc 0.013974 oppfc 0.016949
homerets 0.142097 opprets -0.094260
homepunttb 0.054588 opppunttb 0.051486
homeretave 0.114561 oppretave -0.015596
homepuntlong 0.043231 opppuntlong -0.007458
homepuntblock 0.011052 opppuntblock 0.065411

Table 6. Team Rankings Based on the First Principal Component from Table 5 for All Teams in the NFL. Asterisks (*) Denote Teams that Made the NFL 2000 Playoffs.

Rank Team Conference First PC score Wins

1 BAL * AFC 8.04573 12
2 TEN * AFC 7.50834 13
3 DEN * AFC 3.54837 11
4 OAK * AFC 3.43575 12
5 MIA * AFC 2.91691 11
6 NYG * NFC 2.34373 12
7 PIT AFC 2.31205 9
8 JAX AFC 2.28250 7
9 TB * NFC 1.87205 10
10 GB NFC 1.66451 9
11 NO * NFC 1.38613 10
12 IND * AFC 1.26829 10
13 STL * NFC 1.05026 10
14 PHI * NFC 0.86671 11
15 WAS NFC 0.72484 8
16 DET NFC 0.70963 9
17 BUF AFC 0.02548 8
18 NE AFC -0.33104 5
19 NYJ AFC -0.64109 9
20 KC AFC -0.79622 7
21 CAR NFC -1.31142 7
22 SF NFC -1.73040 6
23 MIN * NFC -1.80792 11
24 DAL NFC -2.08162 5
25 CHI NFC -2.39768 5
26 ATL NFC -3.04457 4
27 SEA AFC -3.81363 6
28 SD AFC -4.52281 1
29 CIN AFC -5.08288 4
30 ARI NFC -6.86933 3
31 CLE AFC -7.53067 3

The R² for the regression of this principal component on the number of wins, as represented in Figure 4, was 73%. Here, the top five AFC teams and five of the top six NFC teams made the playoffs. Furthermore, the principal component correctly selected the Super Bowl opponents and outcome, as well as all of the AFC playoff outcomes. (Its performance with respect to the NFC playoff match-ups was only successful half the time -- the New York Giants' victories over the Philadelphia Eagles and the Minnesota Vikings and the New Orleans Saints' victory over the St. Louis Rams.)

Figure 4.

Figure 4. Principal Component Scores Against Wins.

Finally, we summarize the data separately by offensive, defensive, and special teams variables. Tables 7, 8, and 9 present the relevant principal components. The first principal component for the offense explains 46% of the variation and the first principal component for the defense explains 49% of the variation. Note that the offensive principal component has very similar coefficients to the defensive principal component, with the obvious exception of time of possession. We feel that the difference between these two principal component scores gives an indication of overall team strength. In our attempt to summarize the special teams data, we found that considering the punting and kicking teams separately was superior to trying to do them both at once. Furthermore, we found that the ability of the punting team only had a significant effect on the number of wins. Thus the variables used in Table 9 consist only of punting statistics. The first principal component in Table 9 seems to represent the return capabilities of a team, though it only explains 22% of the variation. The second and third principal components appear most interpretable, summarizing the abilities of the home punting team and the opposing punting team. They explain 16% and 13% of the variation, respectively. We utilize these latter two principal components to devise a punting score in evaluating the teams.

Table 7. First Principal Component Summarizing Offensive Variables for Data from All NFL Teams.

Variable First PC

hometouch 0.343352
homeyards 0.348787
hometop 0.232946
homeratetd 0.323398
oppsacks -0.132067
homerateyds 0.288800
homedrives20 0.053379
homedrives50 0.085324
homeFGa 0.119071
homeFGM 0.118911
home1sts 0.355936
home1rate 0.351317
homepuntno -0.291337
oppint -0.096390
opprecover -0.100766
homepuntrate -0.328446

Table 8. First Principal Component Summarizing Defensive Variables for Data from All NFL Teams.

Variable First PC

hometop -.294770
oppyards 0.326010
opptouch 0.303064
homesacks -.179711
opprateyds 0.179022
oppratetd 0.259483
oppdrives20 0.181012
oppdrives50 0.148804
oppfga 0.221289
oppfgm 0.202965
opp1sts 0.324007
opp1rate 0.332077
opppuntno -.265742
homeint -.195805
homerecover -.111704
opppuntrate -.318653

Table 9. First Three Principal Components Summarizing Punting Variables for Data from All NFL Teams.

Variable First PC Second PC Third PC

opppuntave 0.021507 0.006500 0.591860
opppuntnet 0.381688 -0.177946 0.290832
opppunttb -0.003666 -0.038939 0.162348
opppunt20 0.102521 -0.140235 -0.266048
opppuntblock 0.087499 0.222528 0.024679
homepuntave 0.216575 0.523767 -0.017731
homepuntnet -0.072828 0.519856 -0.101296
homepunttb 0.139907 0.355150 -0.181628
homepunt20 -0.173346 0.180112 0.240860
homepuntblock -0.229419 -0.116809 0.179845
homefc -0.046043 -0.054086 -0.522971
oppfc -0.209811 -0.313148 -0.110353
homeretave -0.441252 0.182222 0.149668
oppretave 0.431589 0.108961 0.122223
homerettds -0.377793 0.122072 -0.017254
opprettds 0.333716 -0.140810 -0.100679

The "total score" in Table 10 is computed as the offensive score (column 6 of the table and Figure 5a) minus the defensive score (column 7 and Figure 5b) plus one-half the punting score (column 8 and Figure 5c). These weights were suggested from the regression of win percentage on all of these three scores. This regression had an R² of 81%. Not surprisingly, the R² for the regression of this "total score" (see Figure 5d) on the number of wins was also 81%. Here, the top six NFC teams and the top five AFC teams made the playoffs. Interestingly, the Raiders and Jets punt teams push them up in the rankings. The Vikings, who have the best punting special team according to this analysis, also fair better. On the other hand, the Redskins are hurt by their punting team.

Table 10. Team rankings based on a summary of the offensive, defensive, and punting special teams play of each NFL team. The offensive and defensive principal components come from Tables 7 and 8. The punting summary measure consists of the difference of the second and third principal components from Table 9. The total score upon which the ranking is based consists of the offensive score minus the defensive score minus one-half the punting special teams score. Asterisks (*) denote teams that made the NFL 2000 playoffs.

Rank Team League Wins Total score Off. score Def. score Punting score

1 TEN * AFC 13 8.1776 1.05733 -6.83163 0.57723

2 OAK * AFC 12 5.6500 3.64252 -0.80435 2.40616

3 BAL * AFC 12 5.4724 0.50198 -6.34898 -2.75720

4 DEN * AFC 11 4.5540 5.32779 -0.54343 -2.63447

5 STL * NFC 10 3.8212 6.30485 2.39239 -0.18245

6 NYG * NFC 12 3.3699 1.01238 -2.74760 -0.78009

7 IND * AFC 10 3.0062 3.26709 0.42423 0.32670

8 JAX AFC 7 2.9492 1.52267 -1.93469 -1.01628

9 NYJ AFC 9 2.8705 0.40729 -1.03838 2.84974

10 MIA * AFC 11 2.8688 -1.70385 -2.72417 3.69706

11 NO * NFC 10 3.0257 1.24803 -1.90010 -0.24487

12 PHI * NFC 11 2.1248 -0.30602 -1.01290 2.83591

13 TB * NFC 10 2.2483 -0.26549 -1.66600 1.69555

14 MIN * NFC 11 1.7890 2.29798 2.73073 4.44349

15 PIT AFC 9 1.5447 -0.48368 -1.87234 0.31219

16 GB NFC 9 1.4350 1.02025 -1.21816 -1.60686

17 WAS NFC 8 0.5425 0.31602 -1.80931 -3.16574

18 BUF AFC 8 -0.1661 -0.06044 -1.70203 -3.61544

19 DET NFC 9 -0.4335 -1.80718 -0.80075 1.14591

20 KC AFC 7 -0.8891 0.77321 1.13140 -1.06190

21 NE AFC 5 -2.1315 -1.65604 1.15651 1.36214

22 DAL NFC 5 -2.0687 -0.96984 1.92338 1.64898

23 SF NFC 6 -2.5398 2.53445 3.06935 -4.00972

24 CAR NFC 7 -3.4113 -0.25456 1.59191 -3.12961

25 CHI NFC 5 -3.6990 -3.40067 0.23609 -0.12443

26 SD AFC 1 -4.5464 -3.27735 2.14002 1.74198

27 ATL NFC 4 -4.8164 -2.61938 2.18382 -0.02650

28 SEA AFC 6 -6.0889 -1.15941 4.24062 -1.37769

29 CIN AFC 4 -6.6120 -4.14713 2.73793 0.54603

30 ARI NFC 3 -7.1163 -2.28034 4.88523 0.09852

31 CLE AFC 3 -10.9308 -6.84243 4.11121 0.04566

Figure 5a.

Figure 5a. Offensive PC Scores Against Wins.

Figure 5b.

Figure 5b. Defensive PC Scores Against Wins.

Figure 5c.

Figure 5c. Punting PC Scores Against Wins.

Figure 5d.

Figure 5d. Punting PC Scores Against Wins.

6. Other Uses

This dataset need not be limited to use in multivariate statistics courses. For example, one could discuss whether teams in the NFC score more touchdowns than teams in the AFC (and whether it is appropriate to use a two-sample t-test for these data). There are innumerable regression models that could be explored as well, but, as part of that, an interesting discussion could result from pointing out that the assumption of independence of observations is not met in this situation. Many students will recognize that the problem is not with, say homeint and oppint, being related (though there is collinearity), but with the number of wins across the teams that violates the assumption.

7. Conclusion

We have provided a reasonably comprehensive dataset for the 2000 NFL regular season. Furthermore, we presented and summarized some of our exploratory analyses on it. We believe that the dataset would be in a good example for use in multivariate statistics courses.

8. Getting the Data

The file nfl2000.dat.txt contains the raw data. The file nfl2000.txt is a documentation file containing a brief description of the dataset.

Appendix - Key To Variables in nfl2000.dat.txt

All rate variables use the total time of possession, that is the average time of possession times the number of games, as the denominator.

Each variable is provided for both the team of interest and their opponents -- the former will be prefixed by "home" and the latter will be prefixed by "opp."

Also included in this data set, but not used in the corresponding paper are longest kickoff return (kolong), number of points (points), rate of first downs (1rate), and turnover rate (torate = number of interceptions plus number of fumble recoveries, divided by time of possession).

 Columns  Variable      Description
  1 -   3 initials      team initials
  5 -  26 team          name and location of the team
 28 -  29 wins          wins
 31 -  32 losses        losses
 34 -  35 homedrives50  drives begun in opponents' territory
 37 -  38 homedrives20  drives begun within 20 yards of the goal
 40 -  41 oppdrives50   opponents drives begun in team's territory
 43       oppdrives20   opponents drives begun within 20 yards of goal
 45       homepuntblock punts blocked by team
 47       opppuntblock  punts team had blocked
 49 -  50 hometouch     touchdowns scored by team
 52 -  53 opptouch      touchdowns scored against team
 55 -  58 homeyards     total yardage gained by offense
 60 -  63 oppyards      total yardage allowed by defense
 65 -  68 hometop       time of possession by offense (in minutes)
 70 -  73 opptop        time of possession by opponents' offense
 75 -  76 homefgm       field goals made
 78 -  79 oppfgm        field goals allowed to opponents
 81 -  82 homefga       field goals attempted
 84 -  85 oppfga        field goals attempted by opponents
 87 -  89 opppuntno     punts made by opponents
 91 -  94 opppuntave    average length of punts made by opponents
 96 -  99 opppuntnet    average change in field position 
                        during opponents' punts
101 - 102 opppunttb     opponents' punts taken for touchbacks
104 - 105 opppunt20     opponents' punts that resulted in the team's
                        offense beginning within 20 yards of their 
                        own (defensive) goal line
107 - 108 opppuntlong   longest opponents' punt
110 - 112 homepuntno    punts made by team
114 - 117 homepuntave   average length of punts made by team
119 - 122 homepuntnet   average change in field position 
                        during team's punts
124 - 125 homepunttb    team's punts taken for touchbacks
127 - 128 homepunt20    team's punts that resulted in the opponents'
                        offense beginning within 20 yards of their 
                        own (defensive) goal line
130 - 131 homepuntlong  longest team punt
133 - 135 home1sts      first downs obtained by offense
137 - 139 opp1sts       first downs allowed by defense
141 - 142 homesacks     sacks achieved by team's defense
144 - 145 oppsacks      sacks allowed by team's offense
147 - 148 homekos       kickoffs made by team
150 - 151 oppkos        kickoffs received by team
153 - 156 homekoyds     yards gained during kickoff returns
158 - 161 oppkoyds      yards allowed to opposition during kickoff returns
163 - 166 homekoave     average yards gained during kickoff returns
168 - 171 oppkoave      average yards allowed during kickoff returns
173 - 175 homekolong    longest kickoff return made by team
177 - 179 oppkolong     longest kickoff return allowed by team
181       homekotds     kickoffs returned for a touchdown by team
183       oppkotds      kickoffs returned for touchdown by opposition
185 - 186 homerets      punts returned by team
188 - 189 opprets       punts returned by opposition
191 - 192 homefc        punts "fair caught" by team
194 - 195 oppfc         punts "fair caught" by opposition
197 - 199 homeretyds    return yardage on punts by team
201 - 203 oppretyds     return yardage on punts by opposition
205 - 208 homeretave    average length of punt returns by team
210 - 213 oppretave     average length of punt returns by opposition
215       homerettds    punts returned by team for a touchdown
217       opprettds     punts returned by opponents for a touchdown
219 - 220 homeint       interceptions made by team's defense
222 - 223 oppint        interceptions made against team's offense
225 - 226 homerecover   fumbles recovered by team's defense
228 - 229 opprecover    fumbles recovered by opposing defenses
231 - 232 numgames      games played by team
234 - 237 opprateyds    average number of yards gained 
                        per minute of possession by opponents
239 - 242 homerateyds   average number of yards gained 
                        per minute of possession by team
244 - 247 opppuntrate   average number of punts 
                        per minute of possession by opponents
249 - 252 homepuntrate  average number of punts 
                        per minute of possession by team
254 - 258 oppratetd     average number of touchdowns 
                        per minute of possession by opponents
260 - 264 homeratetd    average number of touchdowns 
                        per minute of possession by team
266 - 269 winpercent    winning percentage
271 - 275 hometorate    turnovers obtained by team,
                        per minute of possession by opponents
277 - 281 opptorate     turnovers allowed by team, 
                        per minute of possession
283 - 286 home1rate     first downs obtained by team, 
                        per minute of possession
288 - 291 opp1rate      first downs allowed by team's defense, 
                        per minute of possession by opposition
293 - 295 homepoints    points scored by team
297 - 299 opppoints     points scored against team
301 - 303 conference    conference to which the team belongs (AFC or NFC)

Acknowledgements

The authors wish to acknowledge the assistance of our colleagues, Robert Shumway and Alan Fenech, for their helpful comments on a primitive version of this paper. We also thank the Department Editor, Roger Johnson, and two anonymous referees for their suggestions, particularly with respect to the graphs they recommended.

References

Anderson-Cook, C., Thornton, T., and Robles, R. (1997), "Measuring Hockey Powerplay and Penalty Killing Efficiency", in Proceedings of the American Statistical Association Section on Statistics in Sports, Alexandria, VA: American Statistical Association, 11-14.

Johnson, R. A., and Wichern, D. W. (1998), Applied Multivariate Statistical Analysis (4th ed.), Upper Saddle River, NJ: Prentice Hall.

Samaniego, F. J., and Watnik, M. R. (1997), "The Separation Principle in Linear Regression," Journal of Statistics Education [Online], 5(3). (jse.amstat.org/v5n3/samaniego.html)

Venables, W. N., and Ripley, B. D. (1997), Modern Applied Statistics with S-PLUS (2nd ed.), New York: Springer Verlag.

Mitchell Watnik
Statistical Laboratory
University of California
Davis, CA 95616
watnik@wald.ucdavis.edu

Richard A. Levine
Department of Statistics
University of California
Davis, CA 95616
ralevine@ucdavis.edu

Variable	First PC	Second PC
hometop (htop)	0.289154	-0.268395
hometouch (htd)	0.393211	0.103344
opptouch (otd)	-0.099851	0.417011
homeyards (hyd)	0.395883	0.054983
oppyards (oyd)	-0.056191	0.410637
homeratetd (hrtd)	0.362950	0.177276
oppratetd (ortd)	-0.002532	0.409282
homerateyds (hryds)	0.302423	0.233419
opprateyds (oryds)	0.190472	0.320558
oppdrives20 (odriv20)	-0.239607	0.231254
oppdrives50 (odriv50)	-0.321779	0.138639
home1sts (h1sts)	0.397584	0.070433
opp1sts (o1sts)	-0.115843	0.371349

Overall Rank	Team	Conf.	Wins	Overall Score	Offensive Score	Off. Rank	Defensive Score	Def. Rank
1	TEN *	AFC	13	5.38075	1.26479	8	-4.11596	2
2	BAL *	AFC	12	4.66794	0.31373	12	-4.35421	1
3	WAS	NFC	8	2.14909	0.25058	15	-1.89851	5
4	OAK *	AFC	12	1.86825	2.92732	3	1.05907	21
5	MIA *	AFC	11	1.84852	-0.67641	21	-2.52493	3
6	NYG *	NFC	12	1.75330	0.67552	11	-1.07779	7
7	DEN *	AFC	11	1.55059	4.16786	2	2.61728	28
8	PIT	AFC	9	1.51409	-0.42227	19	-1.93636	4
9	NO *	NFC	10	1.23657	1.05029	10	-0.18628	12
10	PHI *	NFC	11	1.19001	0.02921	17	-1.16080	6
11	IND *	AFC	10	1.18673	2.72832	4	1.54159	23
12	BUF	AFC	8	1.02534	0.32804	13	-0.69730	10
13	JAX	AFC	7	0.89662	1.07693	9	0.18031	13
14	NYJ	AFC	9	0.86557	0.23653	16	-0.62903	11
15	TB *	NFC	10	0.60569	-0.29121	18	-0.89690	9
16	STL *	NFC	10	0.27546	4.78642	1	4.51096	31
17	GB	NFC	9	-0.03321	0.31085	14	0.34406	15
18	DET	NFC	9	-0.22774	-1.15149	23	-0.92375	8
19	KC	AFC	7	-0.37763	1.31450	7	1.69212	25
20	MIN *	NFC	11	-0.54273	1.94430	6	2.48703	27
21	SF	NFC	6	-0.88587	2.33624	5	3.22212	30
22	CAR	NFC	7	-1.15463	-0.59112	20	0.56351	17
23	NE	AFC	5	-1.78054	-1.29337	24	0.48717	16
24	DAL	NFC	5	-2.11107	-1.46166	25	0.64941	18
25	CHI	NFC	5	-2.59514	-2.26068	27	0.33447	14
26	ATL	NFC	4	-2.98308	-2.10208	26	0.88099	20
27	SEA	AFC	6	-3.74492	-0.68732	22	3.05760	29
28	CIN	AFC	4	-4.15162	-3.33577	30	0.81585	19
29	SD	AFC	1	-4.65210	-3.03908	29	1.61302	24
30	ARI	NFC	3	-4.84552	-2.47209	28	2.37343	26
31	CLE	AFC	3	-6.09758	-4.90381	31	1.19377	22

Rank	Team	Conference	First PC score	Wins
1	BAL *	AFC	6.92874	12
2	TEN *	AFC	6.62203	13
3	OAK *	AFC	3.19638	12
4	DEN *	AFC	3.05194	11
5	MIA *	AFC	2.44269	11
6	GB	NFC	1.56921	9
7	STL *	NFC	1.50917	10
8	PIT	AFC	1.38003	9
9	JAX	AFC	1.30208	7
10	NYG *	NFC	1.26682	12
11	TB *	NFC	1.14953	10
12	IND *	AFC	0.96163	10
13	PHI *	NFC	0.49389	11
14	NO *	NFC	0.40795	10
15	WAS	NFC	0.22149	8
16	DET	NFC	-0.05276	9
17	CAR	NFC	-0.36193	7
18	NE	AFC	-0.47124	5
19	BUF	AFC	-1.19575	8
20	KC	AFC	-1.35185	7
21	NYJ	AFC	-1.48123	9
22	SF	NFC	-1.48244	6
23	DAL	NFC	-1.65470	5
24	MIN *	NFC	-1.69589	11
25	CHI	NFC	-2.40573	5
26	ATL	NFC	-2.58762	4
27	SEA	AFC	-2.78394	6
28	CIN	AFC	-5.20318	4
29	SD	AFC	-5.26384	1
30	ARI	NFC	-5.41074	3
31	CLE	AFC	-8.13448	3

Variable	First PC
hometouch	0.343352
homeyards	0.348787
hometop	0.232946
homeratetd	0.323398
oppsacks	-0.132067
homerateyds	0.288800
homedrives20	0.053379
homedrives50	0.085324
homeFGa	0.119071
homeFGM	0.118911
home1sts	0.355936
home1rate	0.351317
homepuntno	-0.291337
oppint	-0.096390
opprecover	-0.100766
homepuntrate	-0.328446

Rank	Team	League	Wins	Total score	Off. score	Def. score	Punting score
1	TEN *	AFC	13	8.1776	1.05733	-6.83163	0.57723
2	OAK *	AFC	12	5.6500	3.64252	-0.80435	2.40616
3	BAL *	AFC	12	5.4724	0.50198	-6.34898	-2.75720
4	DEN *	AFC	11	4.5540	5.32779	-0.54343	-2.63447
5	STL *	NFC	10	3.8212	6.30485	2.39239	-0.18245
6	NYG *	NFC	12	3.3699	1.01238	-2.74760	-0.78009
7	IND *	AFC	10	3.0062	3.26709	0.42423	0.32670
8	JAX	AFC	7	2.9492	1.52267	-1.93469	-1.01628
9	NYJ	AFC	9	2.8705	0.40729	-1.03838	2.84974
10	MIA *	AFC	11	2.8688	-1.70385	-2.72417	3.69706
11	NO *	NFC	10	3.0257	1.24803	-1.90010	-0.24487
12	PHI *	NFC	11	2.1248	-0.30602	-1.01290	2.83591
13	TB *	NFC	10	2.2483	-0.26549	-1.66600	1.69555
14	MIN *	NFC	11	1.7890	2.29798	2.73073	4.44349
15	PIT	AFC	9	1.5447	-0.48368	-1.87234	0.31219
16	GB	NFC	9	1.4350	1.02025	-1.21816	-1.60686
17	WAS	NFC	8	0.5425	0.31602	-1.80931	-3.16574
18	BUF	AFC	8	-0.1661	-0.06044	-1.70203	-3.61544
19	DET	NFC	9	-0.4335	-1.80718	-0.80075	1.14591
20	KC	AFC	7	-0.8891	0.77321	1.13140	-1.06190
21	NE	AFC	5	-2.1315	-1.65604	1.15651	1.36214
22	DAL	NFC	5	-2.0687	-0.96984	1.92338	1.64898
23	SF	NFC	6	-2.5398	2.53445	3.06935	-4.00972
24	CAR	NFC	7	-3.4113	-0.25456	1.59191	-3.12961
25	CHI	NFC	5	-3.6990	-3.40067	0.23609	-0.12443
26	SD	AFC	1	-4.5464	-3.27735	2.14002	1.74198
27	ATL	NFC	4	-4.8164	-2.61938	2.18382	-0.02650
28	SEA	AFC	6	-6.0889	-1.15941	4.24062	-1.37769
29	CIN	AFC	4	-6.6120	-4.14713	2.73793	0.54603
30	ARI	NFC	3	-7.1163	-2.28034	4.88523	0.09852
31	CLE	AFC	3	-10.9308	-6.84243	4.11121	0.04566