NAME: pitchfx.dat TYPE: Population SIZE: 66269 observations on 29 variables DESCRIPTIVE ABSTRACT: Individual pitch data for twenty starting pitchers in the 2009 Major League Baseball season. Ten of the pitchers are of the "elite" type (each has been nominated for the main pitching award (the Cy Young award) in recent seasons) and the remaining pitchers are "non-elite". SOURCE: This pitch by pitch data together with extra information such as the inning, batter, and result of the plate appearance is available for free from the Major League Baseball website http://gd2.mlb.com/components/game/mlb/. The author downloaded data from the xml files for 20 starting pitchers in the 2009 season. Data was collected for each of the games that these pitchers played and the files were combined to create a single data file. FORMATS: The play by play dataset is available both as a data file "pitchfx.dat" and a R workspace "pitchfx.Rdata". Tab characters are used to separate variables in the data file. READING INTO R: The play-by-play data file can be read into R by the command pitchdata=read.delim("http://bayes.bgsu.edu/baseball/pitchfx.dat") VARIABLE DESCRIPTIONS: Each row represents information about a given pitch during a particular game. pitcher last name of the pitcher game game number id code identifying the pitcher inning inning of game num number of batter batter code identifying the batter stand side of batter (L or R) b_height height of batter (feet and inches) p_throws side of pitcher (L or R) des detailed description of play at end of plate appearance event brief description of play at end of plate appearance brief_event same as event des2 pitch outcome (called strike, ball, in-play, etc) type pitch code (B, S, or X (in play)) pfx_z deviation (in inches) of pitch trajectory in vertical location px pitch location in x direction as it crosses home plate pz pitch location in z direction as it crosses home plate start_speed starting speed of pitch in mph end_speed speed of pitch crossing plate in mph sz_top top of strike zone in feet sz_bot bottom of strike zone in feet pfx_x deviation (in inches) of pitch trajectory in horizontal location count current pitch count new.count new pitch count value pitch value using linear runs measure pitch_type pitch classification new.count.type PA event or new count count.adv classification of count as pitcher or batter or neutral count Missing values are denoted with NA. PEDAGOGICAL NOTES: Using the Pitch F/X dataset, one can explore the pitching tendencies of the twenty pitchers. What pitches do they throw, what is the movement and speed of these pitches, and where are they thrown relative to the strike zone? Are particular pitches more successful in getting the batter to swing and miss? How do the pitchers differ with respect to pitch type and the speed that they throw the pitches? There are nine pitchers in this dataset that can be regarded as ``elite" since they received votes for the Cy Young award given to the best pitcher in the National League and the American League. What distinguishes these elite pitchers from the remaining ``non-elite" pitchers? REFERENCES: Albert, Jim, Baseball Data at Season, Play-by-Play, and Pitch-by-Pitch Levels Alan Nathan, MLB Extended Gameday Pitch Logs http://webusers.npl.illinois.edu/~a-nathan/pob/tracking.htm SUBMITTED BY: Jim Albert Department of Mathematics and Statistics Bowling Green State University Bowling Green, OH 43403 albert@bgnet.bgsu.edu