NAME: Tour De France Winners (Can Lance Win Six?)
TYPE: Time series
SIZE: 56 observations, 13 variables
DESCRIPTIVE ABSTRACT:
Data are provided for the 56 Tour De France bicycle races since World
War II. The year and dates of the event, the total number of stages,
the total distance, the winning total time and average speed, the name
and country of the winner, the birth date of the winner, and the
winner's age at the time of victory are the variables in the dataset.
SOURCES:
Basic background on the Tour and the winner's names and countries
were obtained from the BBC
(http://news.bbc.co.uk/sport1/hi/other_sports/cycling/2064819.stm).
These names were entered into a variety of search engines in order
to find biographical data (such as birth dates) for the winners.
Common sources were individual rider's Web pages, the Cycling Hall
of Fame (http://www.cyclinghalloffame.com), the sports archives of
L'Equipe (http://www.lequipe.fr/), and other sports history sites
on the Web.
Other details of the races (such as locations and dates) were
primarily obtained from a historical database
(www.letour.fr/2002/us/multiCriteres_2002.html) and the Velo
News (www.velonews.com).
DATASET LAYOUT:
Each year's race is on a single line of a text file with line breaks.
Values are delimited by spaces.
Columns Description
1-4 Year The year of the event.
6-9 Start-Town: The town the first stage started in (the
last stage always finishes in Paris).
25-32 Start-Date: The day of the opening race.
34-41 End-Date: The day of the final stage.
43-44 Stages: The total number of stages in the race. Most
years have a prologue that is counted as a stage. Further,
some years have some stage split into two parts while
counting it as a single stage. Therefore, these stage
numbers are the officially stated number of stages and not
necessarily the exact number of separate races in a
particular year.
46-49 Distance: The total distance of all stages, prologues, and
sub-stages in kilometers.
51-55 Speed: The average speed of the winner in kph.
57-62 Time: The total riding time of the winner in hours.
64-73 Winner: The name of the winner.
84-95 Country: The home country of the winner.
96-103 Birth-Date: The winner's date of birth.
105-106 Age-Year: The winner's age at the time of victory rounded
to the nearest year.
108-111 Age-Tenth: The winner's age at the time of victory rounded
to the nearest tenth of a year.
NOTES:
Obviously Speed, Distance, and Time can be used to cross-check each other.
In checking these numbers, several inconsistencies were found. First,
two sources disagreed on the total distance of the 1999 race. One said
3687 km and the other said 3870 km. Since 3870 was consistent with the
Speed and Time values, it was selected for the dataset. Second, the
Time and Speed values for 1947 and 1952 were only found from one source
but were slightly inconsistent (about .5 hours off over the entire race).
Thus, the average speed was recalculated under the assumption that total
Time and total distance were more likely to be accurate.
STORY BEHIND THE DATA:
Started in 1903 as a newspaper publicity stunt, the Tour De France
has grown into one of the world's most viewed sporting events
(third behind the Olympics and World Cup Soccer). While there have been
many repeat winners in the race's history, only four men have won it
five times each. American Lance Armstrong's recent fourth victory has
brought the Tour more attention in the U.S. and the question being
hotly debated in the cycling community is whether or not he can be the
first to win six times. The identity and birth date of all winners
since World War II were researched on the Internet. Then each winner's
age to the nearest year and nearest tenth of a year were computed.
PEDAGOGICAL NOTES:
It should be noted that several variables can be computed from the others.
This gives the instructor some flexibility over the types of analyses done
and the variables provided to the students. In the author's classes, the
focus of this data has been on the age of the winner.
A time-series plot will show that there has been no trend (up or down) in
the winner's ages since WWII although there are small trends when each of
the multiple winners was at their prime (after all, they got one year older
each year!). Given the lack of overall trend, a simple stem and leaf plot
with split-stems shows a remarkably symmetric distribution of winner's ages
and can lead to discussion of the limits of training and nutrition over
advancing years.
The derivation of the data itself can lead to class discussion. Since the
Tour went on an eight year hiatus during WWII, it seemed logical to use
post-WWII winners to study the "modern"era of bicycle racing. However,
Gino Bartelli's 1948 victory was a repeat. He won his first Tour ten years
earlier, before the war making the WWII cut-off appear somewhat arbitrary.
Students with knowledge of cycling history may question why the timing of
some specific technological advance (such as derailleur gearing and aluminum
frames) wasn't used. This is a good opportunity to discuss the many
decisions that must be made in real data collections.
Once students have studied the age distribution, they can separate the
data into multiple-time winners and one-time winners. Then a distribution
of "age at final victory" for multiple-time winners can be developed.
Students can also do "what if" analysis to see how six victories in a row
for Lance Armstrong would alter both the original distribution and
"final victory" distribution. Z-scores can be used to estimate whether
or not these additional victories by Armstrong would be unusual
with respect to the distribution. If they do this correctly they will
find that six victories in a row by Armstrong will place him among the
oldest winners overall and the oldest to win multiple Tours since WWII.
SUBMITTED BY:
Thomas G. Groleau
Carthage College
314 Lentz Hall
Kenosha, WI 53140
tgroleau@carthage.edu
--