NAME: 1969-2000 Major League Baseball Attendance data TYPE: Census SIZE: 838 team/seasons (records), 10 variables (fields) DESCRIPTIVE ABSTRACT: Data are from The Baseball Encyclopedia (1993) and Total Baseball (2001). They include the location, league affiliation (National or American), division affiliation (East, Central, or West), season of play, home game attendance, runs scored, runs allowed, wins, losses, and number of games behind the division leader for each major league franchise for the 1969 through 2000 seasons. Other data (including opening dates for new stadia, and dates of work stoppages) were collected from Ballparks by Munsey and Suppes (2001) and InfoPlease (2001). VARIABLE DESCRIPTIONS This dataset was taken from The Baseball Encyclopedia (1993) and Total Baseball (2001). Location of the franchise represents the home city or state of the franchise. Season, league affiliation (National or American), and division affiliation (East, Central, or West) are self-explanatory. Home game attendance represents the reported number of fans in attendance for the franchise's home games during the corresponding season. Runs scored and runs allowed represent the sums of runs scored and runs allowed by the franchise over all games during the corresponding season. Wins and losses represent the total number of games won and lost by the franchise during the corresponding season. Finally, number of games the jth team is behind the division leader is calculated as 0.5*[(w1 - wj) + (lj -l1)] where w1 represents the wins by the first-place team wj represents the wins by the jth team l1 represents the losses by the first-place team lj represents the losses by the jth team This is a measure of the distance between the division winner and the team in question at the end of the season. Note: The JSE dataset MLBattend.dat contains the data in a fixed column format with a single row of data for each combination of franchise and season. The format for MLBattend.dat is described below. DATASET LAYOUT: Fixed column format with one data line per franchise per season Columns Variable 1 – 4 Major League Baseball franchise 9 – 10 League affiliation (National or American) 16 – 19 Division affiliation (East, Central, or West) 25 – 26 Season 32 – 38 Home game attendance 43 – 46 Runs scored 51 – 54 Runs allowed 59 – 61 Wins 66 – 68 Losses 74 – 77 Number of games behind the division winner There are no missing values in this dataset. STORY BEHIND THE DATA: This dataset includes variables that could potentially explain variation in recent attendance for Major League Baseball franchises. Thus, these data are of interest to anyone wishing to study the relationships between attendance and various performance variables of the teams, and are potentially useful in applied statistics and economics/econometrics courses. Additional information about these data can be found in the "Datasets and Stories" article "Data Management, Exploratory Data Analysis, and Regression Analysis with 1969-2000 Major League Baseball Attendance" in the _Journal of Statistics Education_ (Cochran 2002, www.amstat.org/v10n2/datasets.cochran.html). PEDAGOGICAL NOTES: This dataset has been used in an undergraduate/Master's level capstone course in data analysis offered by the University of Cincinnati's College of Business Administration. The students enrolled in this course are working toward either an undergraduate major or a Master's degree in quantitative analysis. The purpose of the course (and use of the dataset) is to provide the students with a full experience in analyzing data through a comprehensive data analysis project. The class is divided into groups of three or four students, each of which is provided with a diskette containing a unique part of a dataset. These data include various instructor-induced anomalies such as missing values, repeated observations, and misplaced decimals. The subsets of the data that are distributed to the student groups do not overlap and do combine to form the complete dataset to be used for the course. None of these subsets contain data for all years to be utilized, nor does any subset contain data for every team. Students work cooperatively to clean their data and aggregate the results into a complete dataset. Throughout the course, students are instructed in exploratory data analysis, data management, topics in statistical modeling, and the role of a statistical consultant. Final grades are based primarily on the quality of the final model, a written report, and presentation of results given to an audience comprised of classmates, faculty, and Ph.D. students. REFERENCES: Munsey, P., and Suppes, C. (2001), Ballparks [Online]. (www.ballparks.com/baseball) Frees, E. W. (1996), Data Analysis Using Regression Models, Englewood Cliffs: Prentice Hall. InfoPlease (2001), [Online]. (www.infoplease.com/index.html) Reichler, J. L. (ed.) (2001), The Baseball Encyclopedia, New York: MacMillan Publishing Company. Thorn, J., and Palmer, P. (2001), Total Baseball, New York: HarperCollins Publishers. SUBMITTED BY: James J. Cochran Department of Economics and Finance Louisiana Tech University Ruston, LA, USA 71272 jcochran@cab.latech.edu --