Statistical Thinking Activities: Some Simple Exercises With Powerful Lessons

Kim I. Melton
North Georgia College & State University

Journal of Statistics Education Volume 12, Number 2 (2004), jse.amstat.org/v12n2/melton.html

Copyright © 2004 by Kim I. Melton, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.

Key Words: Data collection; Operational definitions.

Abstract

Statistical thinking is required for good statistical analysis. Among other things, statistical thinking involves identifying sources of variation. Students in introductory statistics courses seldom recognize that one of the largest sources of variation may come in the collection and recording of the data. This paper presents some simple exercises that can be incorporated into any course (not just statistics) to help students understand some of the sources of variation in data collection. Primary attention is paid to operational definitions used in the data collection process.

1. Introduction

Students in introductory statistics courses (and in other courses that use data) often believe that numbers are objective— “numbers don’t lie.” However, data collection always precedes data analysis, and data collection is a process. Therefore, concepts of statistical thinking should be applied to the data collection process before students accept data for analysis.

The American Society for Quality Glossary of Statistical Terms (1996) defines statistical thinking as a philosophy of learning and action based on the following fundamental principles: 1) all work occurs in a system of interconnected processes, 2) variation exists in all processes, and 3) understanding and reducing variation are keys to success. In real world applications, data collection does not take place in a vacuum. People collect data or program machines to collect data. These people decide what data to collect, how to collect the data, how to record the data, and how to communicate what data were collected. Each step in the data collection process offers the opportunity for variability in interpretation of how that step is to be carried out. Collecting “good” data requires recognizing sources of variation and planning to minimize the impact. Even so, variation can exist between what is reported and what the user wants to study.

Helping students recognize some of the reasons that variation can exist in the data collection process is the first step toward helping them evaluate the usefulness of data for a given situation. Wild (1994) proposes that students in introductory statistics courses should be encouraged to develop mental habits that will help bridge the gap between statistical thinking and technical statistical knowledge. These habits include curiosity and the continual generation of questions. Chance (2002) expands on this concept by pointing out that recent advances in technology allow students to focus on the statistical process that precedes calculations and the interpretation of results rather than the number crunching. She specifically identifies two mental habits that students need to develop to think statistically: 1) consideration of how to best obtain meaningful and relevant data to answer the question at hand, and 2) omnipresent skepticism about the data obtained.

Some simple exercises can help students develop a “healthy” skepticism for data. Wild and Pfannkuch (1999) claim that students need to be given numerous situations where they can address data collection issues and the impact choices may have on conclusions. Since data collection and the use of data are fundamental concepts covered in most introductory statistics courses, hands-on exercises can be used in place of, or along with, other methods of instruction without the need to allocate significant additional time.

The next section presents four such exercises along with a discussion about the lessons they provide for students. The exercises are presented in an order that goes from least likely to be influenced by individual interpretation to most likely to include variation due to decisions made during the data collection process. These exercises can be used with students from elementary school through adults in continuing education classes. For each exercise, information about required time, needed materials, potential placement in an introductory statistics course, sample results, and student responses are presented.

2. Exercises

All of these exercises deal with the concept of operational definitions. Operational definitions can be loosely described as descriptions that allow two people to look at the same thing at the same time and record the same measurement. All of these exercises are easy to administer and memorable to students. Some of them can be done on the spur-of-the-moment; others take a little preplanning. None requires specialized equipment.

The "F test". Variations of this exercise are available from a number of sources. This exercise is based on the version distributed by the American Society for Quality (1985).

Placement in the course: Just before the first time any data are used.

Time required:

Approximately five minutes to distribute and explain.
Two minutes to administer.
Five – ten minutes to discuss.

Materials: Individual copies of the following paragraph for each student:

The necessity of training farm hands for first class farms in the fatherly handling of farm livestock is foremost in the minds of effective farm owners. Since the forefathers of the farm owners trained the farm hands for first class farms in the fatherly handling of farm livestock, the farm owners feel they should carry on with the former family tradition of training farmhands of first class farms in the effective fatherly handling of farm live stock, however futile, because of their belief that it forms the basis of effective farm management efforts.

Procedure: Give each student a page with a copy of the paragraph. Ask them to keep the page face down until you explain the exercise and tell them to begin. Provide the following instructions. “Each of you has a copy of the same paragraph. Your job is going to be that of an inspector. As an inspector you are looking for the number of defects in your product. For this exercise, the letter ‘f’ (upper- or lowercase) is considered a defect. Therefore, your job is to count the f’s in the paragraph. There are some constraints. You do work under a time constraint. You will have two minutes to complete your inspection. Also, you are not allowed to alter the product that you are inspecting—so you cannot write on the page.” Confirm that everyone knows what an “f” looks like and respond to any questions. Next, tell them to begin. After two minutes announce, “Stop.” One by one, have students report their answers.

Results: Most people have more than enough time to complete the exercise in two minutes. In fact, most people will have time to attempt the exercise two or three times during this allotted time—and they seldom get the same results each time! Some people get answers in the 25 – 30 range; answers in the mid- to upper-30’s are common; most people report answers in the low- to mid-40’s; the correct answer is 48. Be sure to allow a little time for people to go back and look again—otherwise, they will not be paying attention to whatever you have planned next!

Lessons: In this case, operational definitions should not be a problem. Everyone agrees on what constitutes an “f.” Yet, few people report the right answer. This exercise helps students recognize that 100% inspection is not 100% accurate. Ask students what impact the two-minute limit had on their work. This can lead to a discussion about how management procedures can have unintended consequences. Ask students what they did when they finished before the allotted time. Most will tell you that they went back to “check my answer.” Of course, most will also tell you that when they finished the second time, they had two different answers and had to decide which one to report.

Depending on the level of the course, you can use this to discuss how some organizations do measure items twice and average the two measurements (as a response to a measurement process that has too much variation). You can also use this exercise to talk about 200% inspection, and the possibility that 200% inspection can reduce the accuracy when human inspection is involved (since each inspector thinks the other one will check anything they miss).

Student responses: This exercise has been used in class and as a homework assignment. Regardless of the method of presentation, some students always administer the test to other people. Each semester at least half of the class reports that they have had other people try to count the f’s. Someone in each class since the 2000 presidential election has noted the difficulty with this exercise and the similarity to counting votes in any election.

Several former students who are now faculty members have incorporated this exercise into their own courses. One wrote, "I found the 'f test' amusingly frustrating. How could I be so careful and still be so wrong on something as concrete as counting f’s? Now that I am teaching a course in 'healthcare process measurement' I find that the illustration of counting concrete f’s puts into context the difficulty we face in ‘measuring’ health and healthcare."

How many pages? This exercise was first used in 1987 in Quality Management, an interdisciplinary course for undergraduate and graduate students developed by the author. Since then, it has been used in quality courses (credit and non-credit), statistics courses, and in a number of presentations. Regardless of the group, the results have remained fairly consistent. Occasionally, someone will come up after class and say, "What was the right answer?"

Placement in the course: Near the beginning—before the use of any data that could have been collected by several people

Time required:

One minute at the beginning of the class period to explain.
About 30 minutes between distribution and discussion (This exercise can take place "in the background," i.e., while you continue your lecture.)
About 10 – 15 minutes at the end of the class period to display results and discuss implications.

Materials:

A book; I use Guide to Quality Control (Ishikawa 1986).
One envelope labeled “Paper” containing small, blank slips of paper
One envelope labeled “Answers” for the answers
A page with the following instructions:

Instructions:
Read these instructions completely – THEN:
Take one piece of paper from the envelope marked "Paper."
Count the number of pages in the book, Guide to Quality Control.
Record your answer on the piece of paper that you took from the "Paper" envelope.
Place your answer in the envelope marked “Answer.” Do not look at the other answers.
Pass the book, both envelopes, and the instructions to the next person.
NOTE: You are to work independently – no questions to other workers.

Procedure: At the beginning of class, announce that we will be doing an exercise to collect some data. Each person will have an opportunity to provide input. Start the exercise on one side of the room and give all of the materials listed above to the first person. Tell the class, “When you receive the materials, follow the instructions to the best of your ability. Then pass everything on to the next person. There is no trick.”

Then, continue with the other material planned for that class period. After about 30 minutes, stop the lecture and ask a student to help report the results. Have the student read out the answers, and make a tally of the results on an overhead transparency or on the board.

Results: It is safe to assume that everyone will not report the same results. Most groups will have several people that report results near 248, 236, and in the mid-120’s. Occasionally, there will be a cluster around 225. The results from seven different classes are presented in Table 1.

Table 1. Number of pages reported in Guide to Quality Control

Class 1 Class 2 Class 3 Class 4 Class 5 Class 6 Class 7

333 / 248 /// 248 / 249 / 248 /// 255 / 248 ////

248 ///// 247 / 247 / 248 /// 240 / 250 / 236 ///

246 / 246 / 244 // 239 / 230 // 249 / 235 /

244 / 244 / 242 / 237 / 225 /////// 248 // 234 /

236 ////// 236 ///// 241 // 236 ///// 124 // 244 / 139 /

235 // 232 // 240 / 234 / 123 /// 236 //// 126 /

234 / 225 / 238 / 230 / 225 / 123.5 /

230 / 126 / 236 // 225 / 130 / 123 //

225 / 124 // 234 / 124 /// 125 // 119 /

124 /// 122 // 232 // 121 / 122 / 117.5 /

123 / 45 / 214 / 118 /

122 / 133 / 36 /

74 / 126 /

124 /

Class 1	Class 2	Class 3	Class 4	Class 5	Class 6	Class 7
333 /	248 ///	248 /	249 /	248 ///	255 /	248 ////
248 /////	247 /	247 /	248 ///	240 /	250 /	236 ///
246 /	246 /	244 //	239 /	230 //	249 /	235 /
244 /	244 /	242 /	237 /	225 ///////	248 //	234 /
236 //////	236 /////	241 //	236 /////	124 //	244 /	139 /
235 //	232 //	240 /	234 /	123 ///	236 ////	126 /
234 /	225 /	238 /	230 /		225 /	123.5 /
230 /	126 /	236 //	225 /		130 /	123 //
225 /	124 //	234 /	124 ///		125 //	119 /
124 ///	122 //	232 //	121 /		122 /	117.5 /
123 /	45 /	214 /	118 /
122 /		133 /	36 /
74 /		126 /
		124 /

Lessons: The results offer a chance to talk about the role of operational definitions. For example, students will usually note that answers tend to cluster around 248, 236, 225, and the mid-120s. The students who recorded 248 explain that they counted each sheet that could have a page number on it—including fronts, backs, and blank sheets; the ones who recorded 236 counted each sheet with print—including the title page and table of contents; the students who responded with 225 looked at the page number at the bottom of the last numbered page; and the ones who reported numbers in the mid-120s considered the front and back of the same sheet of paper as a single page rather than as separate pages. This helps students recognize how different operational definitions may lead to different data. This does not make one definition right and the other wrong—it simply emphasizes the need to communicate what definition is being used and to select a definition that is appropriate to the situation. The choice of the Ishikawa book also provides an opportunity to let students see a common reference book used in industry.

Scheaffer, Gnanadesikan, Watkins, and Witmer (1996) provided a similar exercise based on having students measure the diameter of a tennis ball. In this case, students were required to address additional sources of variation that could impact the data collected. For example, the choice of the measurement device and the choice of units of measure could influence the reported results.

Student responses: Students readily see how different definitions lead to different results. They also recognize that many of the reported results are logical once the definition is known. They also express frustration with the lack of a (single) correct answer. Students often reference this exercise on the end of semester course evaluations. For example, one student wrote, "Please include more activities like the day you passed around the book. These really helped me understand what you were talking about."

The first time I used this exercise, there was a student in the class who worked for a printing company that charged clients by the page. The student decided to take the exercise back to his company. His theory was that he would see less variation since their billing was related to the number of pages in a publication. He was surprised when the results did not support his theory. He found a wider range of responses and saw considerable clustering of responses. He started to look at who gave him each answer. He realized that everyone in procurement gave the same answer (based on the amount of paper that would need to be purchased to print the book); everyone in typesetting gave the same answer as each other (based on the number of times they would need to set margins)—but different from the answer given by people in procurement; people in imaging gave the same answer as each other—but different from the other groups; and people on the press gave the same answer as each other—but different from the other groups. As he looked across the plant at a very large room with thousands of dollars of scrapped production, he recognized that this simple exercise helped explain some of the scrap. There were times when someone in typesetting or some other department would find an error, go onto the production floor, and tell them to quit running page x. The printing line would stop running what they erroneously believed to be the page with an error. However, they would continue to run the other pages, including the one with an error, at a higher speed. As a result, they created scrap faster!

"Most” of the time. This exercise is one that works best as a spur-of-the-moment exercise. It can be used to provide a break in a longer presentation, or it can be used as a warm up exercise to get a group talking.

Placement in the course: Any place where students start to use words that may have different meanings for different people

I use this exercise when we are looking at control charts and beginning to discuss runs tests. I state that most of the points will plot near the centerline when a process is stable. Students usually nod their heads as though they understand. Then I tell them that this is still conceptual—that we need to operationalize the concept of “most” points "near" the centerline.

Time required: About five minutes

Materials: None (A board, a flipchart, or an overhead is useful to display the results.)

Procedure: Students are asked to think about the word "most." (You can use other words—e.g., few, some, almost all, not many.) They are instructed to associate the word "most" with a number between 0 and 100—where 0 corresponds to 0% and 100 corresponds to 100%. Students are asked to close their eyes and keep them closed until instructed otherwise. On the board, write:

0 – 10
11 – 20
21 – 30
...
91 – 100

Then, tell students to "Raise your hand if you are thinking about a number between 0 and 10." Once the count is recorded, proceed to the next interval. After the last interval, tell students they can open their eyes. It is usually quite obvious that everyone does not all have the same idea what the word "most" means.

Results: It is not unusual for someone to raise their hand in the 31 – 40 or in the 41 – 50 range. The modal class is usually in the 80’s or 90’s.

Lessons: This provides another opportunity to talk about the need for operational definitions, and to point out how simple words can carry different meanings to different people. Communication (especially between customer and supplier) depends on a common language.

This exercise provides a way to talk about a run consisting of a sequence of consecutive points that follow some pattern. How many points are required to call consecutive points a pattern? If multiple people are using the same chart to determine the appropriate action to take, they need a common understanding about how many consecutive points are necessary to call this a pattern and to look for a special cause.

Student responses: The immediate response from students is disbelief. When they open their eyes, their non-verbal responses show their surprise. Later in the semester when we talk about hypothesis testing, I ask students if they remember this exercise. Almost all do. This provides a lead-in for discussing what constitutes sufficient evidence and how this depends on the amount of risk the decision maker is willing to take. Students see that different people could have different ideas about what constitutes sufficient evidence or a small risk.

"Fast" food. This exercise is a little on the "devious" side. The purpose stated to the class at the beginning, although reasonable and desirable, is not the real purpose of the exercise.

Placement in the course: At the very beginning of the course.

I usually start this experiment on the very first day of class. To work effectively, this needs to be done with a large group (or multiple sections).

Time required:

Five minutes to give out the assignment.
Up to one week for students to complete the assignment.
Lots of data entry time between collection of the data and discussion.
10 – 15 minutes to discuss.

Materials: A data collection form.

The data collection form needs to include a place for the following:

Name of data collector:
Day and date data were collected:
Name of the restaurant chain:
Address of specific restaurant:
Time of day visited:
Number of seconds that customer spent in line:
Number of seconds for customer to be served:
(about 2 inches of blank space to be used when they submit their data)

Procedure: On the first day of class when you cover the mechanics of the course (textbooks, grading, topics to be covered, etc.), tell them that you want them to help collect some data that can be used during the semester to illustrate some of the tools and techniques. Tell them, "You must visit one location of McDonalds, Taco Bell, or Arby’s and collect data on one customer. You do not have to buy anything; you can observe some other customer, or you can be a customer and collect data on your own service experience. The data must come from a customer inside—you cannot use customers at the drive-thru." Explain that you will take the individual observations they collect and compile them into one large data set. Give them the data collection form, and talk about the types of theories that might be tested with the data. For example, does one chain tend to be faster or slower than another chain? Does the day of the week make a difference? Students will often propose other theories. Then, give them a chance to ask questions about the assignment. There are usually very few questions. The most common question is, "Do we need to go to one McDonald's, one Taco Bell, and one Arby's or just one restaurant total?" My response is, "Just one restaurant total." Allow them a week to complete the assignment.

Results: On the day that the assignment is due, ask them to provide a little more information on the bottom of the data collection form. Each student should complete the following four sentences:

I started timing “time in line” when _____________________. (What event signaled you to start timing?)
I stopped timing “time in line” when ____________________.
I started timing “service time” when ____________________.
I stopped timing “service time” when ___________________.

When they finish, have them pass in their data collection forms.

When you compile the data, keep a record of the responses to the four statements. In a group of 100 responses, there may be as many as 20 to 30 different responses to the four fill-in-the-blank statements. Some of the distinctions may be insignificant (e.g., "I started timing ‘service time’ when the cashier finished asking for my order" vs. "I started timing 'service time' when the cashier started asking for my order"), but other distinctions are important (e.g., "I stopped timing 'time in line' when the cashier started to ask for my order" vs. "I stopped timing 'time in line' when I received my completed order"). From 100 students collecting data, there are usually about 40 that have used similar enough starting and ending points to treat them as measuring the same characteristic.

Lessons: In reality, the reason for giving this assignment was not collecting data for use later in the semester (although that is a nice result if good data are collected). The real reason was to have students recognize data collection as a process that requires planning. As part of the planning process some of the issues that should have been considered are determining the questions that could be answered as a result of data collection, designing data collection forms that are easy to use, training data collectors, and anticipating problems. In this exercise, the variation in interpretations can probably be tracked back to a lack of training. This gives a lead-in to process studies, the PDSA Cycle (Plan-Do-Study-Act Cycle), and the collection and use of data.

In addition, you usually have the opportunity to discuss the "believability" of the data. A high percentage of the reported observations often end in 0 or 5 (e.g., 15 seconds or 20 seconds rather than values such as 18 seconds or 21 seconds). Students are able to develop theories about the expected shape of a histogram for "time in line" or "service time." If the horizontal axis represents the amount of time required and the vertical axis represents frequency or relative frequency, most students will agree that the shortest possible time will be zero seconds, that the histogram will tend to be right skewed (since some orders will take considerably longer than the typical order), and that the histogram will be fairly smooth when you have lots of data. The data collected usually produces a histogram that has large spikes at 0, 15, 30, 45, 60, etc. A few students will usually acknowledge that they estimated, or even "created," the times.

One semester on the first day, a student asked for clarification about what to measure. The class discussed what should be measured. This class used the same data collection form, and the students still responded to the same four statements the day they turned in their data. About 80 to 90 percent of the students in this section used similar starting and ending times. This provided an opportunity to discuss why less than 100% of the students used the same definitions and to compare the data collected in this section to data collected by another section.

Student responses: As a result of this exercise, students question the data provided for many of the homework assignments in their textbook. Recently, when a large fast food chain started advertising that cars would be served in the drive-thru lane within a specified number of seconds, students in the current class as well as students from previous semesters raised questions. They wanted to know how the time was going to be measured. In addition, they developed theories about how the employees were going to alter their behavior in order to obtain the desired measurements. These students were not surprised when the advertisements quietly and quickly disappeared!

3. Summary

Data collection underlies most statistical techniques. When students recognize data collection as a process and begin to apply concepts of statistical thinking to this process, they begin to look at statistics in a different way. Rather than seeing statistics as an isolated quantitative course, they begin to see statistics as a tool used in many disciplines.

When these exercises are used early in the semester, students tend to raise more questions throughout the semester. Many of these questions relate to understanding the underlying situation. As a consequence, students develop an appreciation for the need for subject matter knowledge coupled with statistical techniques. In addition, these students more clearly discuss how data were obtained or point to potential problems in data collection when they communicate results of statistical studies.

Since many reports include numerical results that are compiled from individual reports created by different people, students begin to develop a degree of skepticism for media reports. They begin to recognize that until people understand the need for uniform reporting standards, summary reports are likely to be misleading. These exercises, used individually or in combination across a course, help students become better data collectors. In addition, they help students become better consumers of reports.

References

American Society for Quality (1985), Quality Illustrated, Milwaukee, WI: Author.

American Society for Quality (1996), Glossary of Statistical Terms, Milwaukee, WI: Author.

Chance, B. L. (2002), "Components of Statistical Thinking and Implications for Instruction and Assessment," Journal of Statistics Education [Online], 10(3). jse.amstat.org/v10n3/chance.html

Ishikawa, K. (1986), Guide to Quality Control (2nd ed.), White Plains, NY: Quality Resources.

Scheaffer, R., Gnanadesikan, M., Watkins, A., and Witmer, J. (1996), Activity-Based Statistics, New York: Springer-Verlag Publishers.

Wild, C. J. (1994), "Embracing the 'Wider View' of Statistics," The American Statistician, 48, 163-171.

Wild, C. J., and Pfannkuch, M. (1999), "Statistical Thinking in Empirical Enquiry," International Statistical Review, 67, 223-265.

Kim I. Melton
Dillard Munford Chair of Management
Business Administration Department
North Georgia College & State University
Dahlonega, GA 30597
kmelton@ngcsu.edu