NAME: Tour De France Winners (Can Lance Win Six?) TYPE: Time series SIZE: 56 observations, 13 variables DESCRIPTIVE ABSTRACT: Data are provided for the 56 Tour De France bicycle races since World War II. The year and dates of the event, the total number of stages, the total distance, the winning total time and average speed, the name and country of the winner, the birth date of the winner, and the winner's age at the time of victory are the variables in the dataset. SOURCES: Basic background on the Tour and the winner's names and countries were obtained from the BBC (http://news.bbc.co.uk/sport1/hi/other_sports/cycling/2064819.stm). These names were entered into a variety of search engines in order to find biographical data (such as birth dates) for the winners. Common sources were individual rider's Web pages, the Cycling Hall of Fame (http://www.cyclinghalloffame.com), the sports archives of L'Equipe (http://www.lequipe.fr/), and other sports history sites on the Web. Other details of the races (such as locations and dates) were primarily obtained from a historical database (www.letour.fr/2002/us/multiCriteres_2002.html) and the Velo News (www.velonews.com). DATASET LAYOUT: Each year's race is on a single line of a text file with line breaks. Values are delimited by spaces. Columns Description 1-4 Year The year of the event. 6-9 Start-Town: The town the first stage started in (the last stage always finishes in Paris). 25-32 Start-Date: The day of the opening race. 34-41 End-Date: The day of the final stage. 43-44 Stages: The total number of stages in the race. Most years have a prologue that is counted as a stage. Further, some years have some stage split into two parts while counting it as a single stage. Therefore, these stage numbers are the officially stated number of stages and not necessarily the exact number of separate races in a particular year. 46-49 Distance: The total distance of all stages, prologues, and sub-stages in kilometers. 51-55 Speed: The average speed of the winner in kph. 57-62 Time: The total riding time of the winner in hours. 64-73 Winner: The name of the winner. 84-95 Country: The home country of the winner. 96-103 Birth-Date: The winner's date of birth. 105-106 Age-Year: The winner's age at the time of victory rounded to the nearest year. 108-111 Age-Tenth: The winner's age at the time of victory rounded to the nearest tenth of a year. NOTES: Obviously Speed, Distance, and Time can be used to cross-check each other. In checking these numbers, several inconsistencies were found. First, two sources disagreed on the total distance of the 1999 race. One said 3687 km and the other said 3870 km. Since 3870 was consistent with the Speed and Time values, it was selected for the dataset. Second, the Time and Speed values for 1947 and 1952 were only found from one source but were slightly inconsistent (about .5 hours off over the entire race). Thus, the average speed was recalculated under the assumption that total Time and total distance were more likely to be accurate. STORY BEHIND THE DATA: Started in 1903 as a newspaper publicity stunt, the Tour De France has grown into one of the world's most viewed sporting events (third behind the Olympics and World Cup Soccer). While there have been many repeat winners in the race's history, only four men have won it five times each. American Lance Armstrong's recent fourth victory has brought the Tour more attention in the U.S. and the question being hotly debated in the cycling community is whether or not he can be the first to win six times. The identity and birth date of all winners since World War II were researched on the Internet. Then each winner's age to the nearest year and nearest tenth of a year were computed. PEDAGOGICAL NOTES: It should be noted that several variables can be computed from the others. This gives the instructor some flexibility over the types of analyses done and the variables provided to the students. In the author's classes, the focus of this data has been on the age of the winner. A time-series plot will show that there has been no trend (up or down) in the winner's ages since WWII although there are small trends when each of the multiple winners was at their prime (after all, they got one year older each year!). Given the lack of overall trend, a simple stem and leaf plot with split-stems shows a remarkably symmetric distribution of winner's ages and can lead to discussion of the limits of training and nutrition over advancing years. The derivation of the data itself can lead to class discussion. Since the Tour went on an eight year hiatus during WWII, it seemed logical to use post-WWII winners to study the "modern"era of bicycle racing. However, Gino Bartelli's 1948 victory was a repeat. He won his first Tour ten years earlier, before the war making the WWII cut-off appear somewhat arbitrary. Students with knowledge of cycling history may question why the timing of some specific technological advance (such as derailleur gearing and aluminum frames) wasn't used. This is a good opportunity to discuss the many decisions that must be made in real data collections. Once students have studied the age distribution, they can separate the data into multiple-time winners and one-time winners. Then a distribution of "age at final victory" for multiple-time winners can be developed. Students can also do "what if" analysis to see how six victories in a row for Lance Armstrong would alter both the original distribution and "final victory" distribution. Z-scores can be used to estimate whether or not these additional victories by Armstrong would be unusual with respect to the distribution. If they do this correctly they will find that six victories in a row by Armstrong will place him among the oldest winners overall and the oldest to win multiple Tours since WWII. SUBMITTED BY: Thomas G. Groleau Carthage College 314 Lentz Hall Kenosha, WI 53140 tgroleau@carthage.edu --