# A Coin-Flipping Exercise to Introduce the P-Value

Nicholas P. Maxwell
University of Washington, Bothell

Journal of Statistics Education v.2, n.1 (1994)

Copyright (c) 1994 by Nicholas P. Maxwell, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.

## Abstract

The p-value can be introduced with a coin flipping exercise. The instructor flips a coin ten times and has a student call each flip. The students record their thoughts after each flip. The instructor reports that the caller calls every flip correctly. In this exercise students intuitively reject a null hypothesis because the p-value is too small. Students are reassured to learn from this concrete example that they intuitively followed the logic of statistical inference before they studied statistics.

1 The meaning of the p-value is essential for understanding statistical inference. Nonetheless, many students have trouble keeping track of what a p-value is. It is common to confuse the p-value with the probability that the null hypothesis is right, or with the probability that the alternative, experimental hypothesis is wrong (Phillips 1971, p. 80; Freedman et al. 1991, p. 435).

2 The usual method of introducing the p-value is to lead the students through the steps of a t-test, explaining the jargon that identifies each element in the procedure (e.g., Witte 1993, p. 303; Freedman et al. 1991, p. 435; Wonnacott and Wonnacott 1990, p. 293; Winkler and Hays 1975, p. 443; Moore and McCabe 1989, p. 466; Hildebrand 1986, p. 333). Such a presentation is usually very precise in stating what the p-value is (i.e., the probability of getting at least as extreme a t as was found, if the null hypothesis is true). Such a presentation may be hard for students to keep clear, because it does not show that the decision rule is sensible. As if they were Bayesians, the students seem to yearn for the probability that the experimental hypothesis is right. Such a probability would support a very intuitive decision rule: accept the experimental hypothesis if it is very likely to be true. Rather than describing such a clearly sensible inferential rule, introductory texts appear to describe something arcane and peculiar.

3 Actually, the decision rule underlying statistical inference is very natural, and need not seem at all arcane to students. Students could see how reasonable statistical inference is if they saw that people often intuitively think the same way ( Hong and O'Neil 1992).

4 Some alternatives to the traditional introduction to significance testing have been suggested. Morgan and Morgan (1984) describe the use of Monte Carlo trials to illustrate the p-value, but their exercise does not seem to reveal the intuitive nature of significance testing. Bonsangue (1992) does focus on the intuitive nature of significance testing; he presents elaborate classroom activities that illuminate how people naturally try to minimize Type I and Type II errors. In contrast to Bonsangue's methods, the technique presented here is efficient and relies on very minimal materials.

5 Popham and Sirotnik (1992) provide a compelling example of an intuition that follows the logic of statistical inference. They write:

Suppose Joe and June are betting for cups of coffee on the basis of a tossed coin, with the loser buying. Joe does all the coin flipping, and June decides to call tails every time, figuring to win approximately half of the cups of coffee. If the coin turns up heads ten times in a row, June ... might begin to suspect that there is something suspicious about the coin -- or the person flipping it! (Popham and Sirotnik 1992, p. 48)

Anyone can follow this story, even without a background in statistics or probability. This example can be brought into the classroom, and students can be placed in June's position. They can see for themselves that, like June, they would reject an idea when something happens that would have been very unlikely had the idea been true.

6 I use a classroom version of Popham and Sirotnik's (1992) example. To introduce statistical inference, I flip a coin ten times and have a student call each flip. After each flip, I report how the caller did and then ask the class to record their thoughts about what is happening. While the students record their thoughts, I record on the blackboard whether the caller was correct, leaving space to add the students' thoughts later. The trick for teaching statistical inference is that I lie: I report that the caller calls every flip correctly.

7 Initially, this is a fairly bland exercise, but as the caller succeeds in calling correctly three flips, four flips, and then five flips, the class gets extremely agitated very quickly. Some students get so excited that I must ask them to be quiet to allow others to write down their thoughts as the exercise progresses. During the exercise, I try to act surprised, but it is not necessary to do a great acting job. The exercise depends on the students figuring out that something funny is going on, so if my expression betrays the ruse, no harm is done. After the last flip, I ask the students what they thought and when they thought it, and I write their thoughts about each flip on the blackboard. In my classes, all of the students report that by the tenth flip they decided something funny was going on.

8 To guide the students to discover their decision rule in this situation, I ask several questions. "What can you conclude about what was happening?" The students report that they concluded that I was lying or that I had prearranged some sort of magic trick with the caller. "Why did you make that conclusion?" They report that they made this conclusion because there is too small a chance that the caller could call ten coin flips in a row correctly. When pressed, they report that, assuming it is a fair coin, and assuming that I'm not lying, and assuming that the caller is not telepathic, then it is extremely unlikely that the caller would call ten flips correctly. I ask, "Seeing that there are several aspects to your initial conception of what was happening, what is the best way to summarize your conclusions?" Most agree (with some prompting) that all they can conclude from the experiment itself is that at least one aspect of their initial conception is likely to have been false; to reach a final conclusion, they have to consider what they know about each of the assumptions to decide which to discard.

9 I then ask why they didn't decide something funny was going on after two flips or three flips, and why they didn't wait until eleven flips before leaping to conclusions. Most students report that three correct calls is not unlikely enough, and that there is no need to wait for eleven correct calls, because ten is already much too unlikely. I point out that they seem to have some sort of cut-off for unlikeliness: anything less likely than that, and they doubt their initial conceptions. Most students agree that this makes sense.

10 I then label the elements of their thinking in statistical jargon. I point out that they started with a particular idea of what was going on: the idea was that it was a fair coin, that the caller was not clairvoyant, and that I was reporting the caller's successes honestly. I explain that an initial conception of what is going on is called the "null hypothesis," that the probability of the results occurring, if the null hypothesis is true, is called the "p-value," and that the cut-off used to decide whether to reject the null hypothesis is called "alpha."

11 I use this exercise just before introducing significance testing and then refer back to it throughout the course whenever students seem to have lost track of what the p-value is. Because the exercise is very memorable, it takes only brief reminders to bring students back to a clear understanding of the p-value.

12 Doing the exercise before presenting the ideas of significance testing works for some students; some students enjoy being allowed to discover the logic of significance testing for themselves. Other students might benefit more from having the exercise directly after learning the procedures and jargon of significance testing. To reach all students, it may be ideal to have two such exercises, and to use one just before introducing the jargon, and a second just afterwards. In this way, the technique presented here could very nicely complement the technique presented by Eckert (1994).

13 Most of my students seem to feel that significance testing is like thinking backwards, but in this exercise they follow the logic of significance testing without any special effort. That this decision rule is intuitive and that they used it before they studied it in class is very important for many students. They are reassured to learn that the logic of statistical inference is quite natural.

## Acknowledgements

Thanks go to three anonymous referees for very helpful comments on an earlier draft of this paper. Thanks for comments also go to Rachel Maxwell, Janet Robertson, and James Robertson.

# References

Bonsangue, M. V. (1992), "Is it true that `blonds have more fun'?," Mathematics Teacher, 85, 579-581.

Eckert, S. (1994), "Teaching Hypothesis Testing With Playing Cards: A Demonstration," Journal of Statistics Education, v.2, n.1.

Freedman, D., Pisani, R., Purves, R., Adhikari, A. (1991), Statistics (2nd ed.), New York: W. W. Norton.

Hildebrand, D. K. (1986), Statistical Thinking for Behavioral Scientists, Boston: Duxbury Press.

Hong, E. and O'Neil, H. F., Jr. (1992), "Instructional strategies to help learners build relevant mental models in inferential statistics," Journal of Educational Psychology, 84, 150-159.

Moore, D. S., and McCabe, G. P. (1989), Introduction to the Practice of Statistics, New York: W. H. Freeman.

Morgan, L. A. and Morgan, F. W. (1984), "Personal computers, p-values, and hypothesis testing," Mathematics Teacher, 77, 473-478.

Phillips, J. L. (1971), How to Think about Statistics (revised ed.), New York: W. H. Freeman.

Popham, W. J., and Sirotnik, K. A. (1992), Understanding Statistics in Education, Itasca, Illinois: F. E. Peacock Publishers.

Winkler, R. L., and Hays, W. L. (1975), Statistics: Probability, Inference, and Decision (2nd ed.), New York: Holt, Rinehart, and Winston.

Witte, R. S. (1993), Statistics (4th ed.), Fort Worth: Harcourt Brace Jovanovich College Publishers.

Wonnacott, T. H., and Wonnacott, R. J. (1990), Introductory Statistics (5th ed.), New York: John Wiley and Sons.

Nicholas P. Maxwell
University of Washington, Bothell
Department of Liberal Studies, XB-05
22011 26th Avenue S. E.
Bothell, WA 98021
NMaxwell@U.Washington.edu