# What is the Probability of a Kiss? (Itĺs Not What You Think)

Mary Richardson
Grand Valley State University

Susan Haller
Saint Cloud State University

Journal of Statistics Education Volume 10, Number 3 (2002), jse.amstat.org/v10n3/haller.html

Copyright © 2002 by Mary Richardson and Susan Haller, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.

Key Words: Active learning; Advanced Placement Statistics; Probability; Sampling distribution of a sample proportion; Simulation.

## Abstract

This paper begins by describing two hands-on activities developed for teaching basic statistical concepts to junior high students. Through generating, collecting, displaying, and analyzing data, students are given the opportunity to explore a variety of descriptive statistical techniques and develop an understanding of the distinction between theoretical, subjective, and empirical (or experimental) probabilities. These activities are then extended to introduce the sampling distribution of a sample proportion. The extension is appropriate for use in grades 9 through 12, in an Advanced Placement (AP) Statistics course, or in an introductory statistics course at the undergraduate level.

## 1. Part I: What is the Probability of a Kiss?

### 1.1 Introduction

A group of junior high students was asked the probability of a flipped coin showing heads upon its landing. The unanimous answer was one-half or 50-50. When asked to justify their answers, invariably the response was that the coin has two sides and only one of them shows a head. ôWhat about this HERSHEYĺS KISS?ö was asked, ôIt also has two sides. Is the probability of it landing on its base 50-50 or one-half?ö This begins an experiment in which students cannot simply look at an object and determine the probability for any specific outcome.

The National Council of Teachers of Mathematics (NCTM) Standards state that an understanding of probability and statistics is essential to being an informed citizen (NCTM 1989). Shaughnessy and Bergman (1993) note that ôPerhaps there is no other branch of mathematics as important for all students ů as probability and statistics.ö In real-life situations, students will be asked to make decisions based upon data they or others collect and analyze, and they will need to properly interpret this information in order to make educated decisions.

Junior high students know the probabilities assigned to various events such as tossing coins, rolling dice and spinning spinners. However, most of their experiences with probabilistic situations are limited to objects that can be assigned theoretical probabilities, that is, they have outcomes that are equally likely. Most real-life situations, however, cannot be assigned a theoretical probability. For example, prescribing medical treatments, determining insurance rates, and forecasting weather all require probabilistic assessments that usually do not involve equally likely outcomes. In addition to studying theoretical probability, junior high students should have extensive experience with subjective probability and empirical (or experimental) probability.

Along with a variety of probability experiences, junior high students should be provided ample opportunities to explore the related topic of statistics. In particular, students should be encouraged to collect, organize, and describe data, construct appropriate charts and graphs to summarize the data, make predictions and conclusions based on the data, and test these predictions and conclusions (NCTM 1989, 2000).

The following sections describe in detail two activities that have been used with junior high students. The first activity can be used to help students develop probabilistic reasoning in a situation that has no associated theoretical probability. The second activity allows students to develop a hypothesis and demonstrates the importance of testing hypotheses prior to generalizing sample results to populations. The estimated completion time for these activities is a one-hour class period.

### 1.2 Activity 1

Background

Students enjoy collecting and analyzing data (especially, it seems, when candy is involved). In this activity, students explore the empirical probability that a plain HERSHEYĺS KISS will land on its flat base when spilled from a cup. This activity works best if students have had prior experience calculating measures of center.

Materials

Students work in groups of three. Each group has ten plain HERSHEYĺS KISSES chocolates, a 16-ounce plastic cup, a flat table or desktop on which to work, and each student has a sticky note and a copy of the Activity 1 Worksheet (see Appendix A).

Procedure

To begin the lesson, each student takes one plain HERSHEYĺS KISS and examines it. (We tell them they will get to eat the candy later.) We discuss the possible outcomes if the candy was tossed onto the table. ôIt can land uprightö (on its base) and ôIt can land sidewaysö are the two common responses. However, another enthusiastic response has included, ôI could toss it just right and it could just happen to land in my mouth!ö For the purpose of this experiment, the class should agree that there are two possible outcomes for tossing the candy - landing on the base and landing on the side.

Each student is then asked to estimate the probability that a plain KISS will land on its base when tossed. The students should write down their estimates, then share their estimates and the reasoning behind them. Estimates usually range from 20% to 80%. ôThereĺs more room on the sides,ö explained a student who selected 40% as her probability - ôIt is more likely to land there than on the bottom.ö ôIt just seems like it will tip over more often and land on its side,ö said another. A student who selected a probability higher than 50% reasoned, ôThe base is all flat, so it will be more likely to land there and stay.ö

It is important to note that, even though it is agreed that there are two outcomes, very few students select exactly 50% as their estimate. Students explain that there are two outcomes, but it doesnĺt seem like they will happen with the same frequency. This leads into a discussion about the three different types of probability (theoretical, subjective, and empirical), and students recognize that this is a situation for which there is no theoretical probability. Because we have no reason to believe that the two outcomes for the candy are equally likely, we must assign probabilities based upon what we think will happen (subjective probability).

After students share their subjective estimates, they are ready (and eager) to test them by conducting an experiment. We agree that, rather than tossing candies up in the air as we would a coin, we will spill them from a cup. Each group puts all ten candies into their plastic cup, and the following tasks are assigned:

• One group member spills the candies onto the table ten times, each time counting the number that land on their base.

• A second group member helps the spiller count the candies.

• The third group member records the results on the Activity 1 Worksheet.

After one student has spilled the cup ten times, the assigned duties rotate among group members. We explain to the class that, because we are all conducting the same experiment, we should all try to shake the cup and spill the candies in a similar fashion. We suggest they gently shake the cup twice and spill the cup quite close to the table so that the candies do not break.

After completion of tossing the KISSES, each student will have performed a total of ten trials. In Table 1 we have included an example of typical outcomes for the tosses of the candies.

Table 1. Example results for the KISSES tosses.

 TossNumber Number of KISSESLanding on Base 1 2 3 4 5 6 7 8 9 10 5 7 2 3 1 4 3 3 7 4 Total 39

Once all students have a chance to conduct the experiment, we ask them to refine their original guesses. Many students use only the total number of base landings from their own experiment while other students combine their total with members of their group. When asked how they could be more certain of the probabilities, most respond that a longer experiment (more tosses) would result in more certainty.

Students recognize that it is unrealistic for each of us to spill the candies a very large number of times. Candies might break or melt, and it would take too much time. Instead, we decide to use the results from the entire class to get a better estimate for the probability. In order to combine data, we must agree that the candies are essentially identical. Each student writes his or her own total on a sticky note and arranges the note on the whiteboard in order to form a frequency plot or a stem-and-leaf plot. After we have discussed the data for the whole class, everyone makes a new estimate for the probability.

Students should be asked to share their new probability estimates. Most calculate the relative frequency (empirical probability) of base landings for the class and use this value as their estimate. Others calculate the relative frequency and adjust the value to get an answer that they think will be more reasonable. For example, one student adjusted the empirical probability of 37% to a subjective probability of 40%, explaining that she was more comfortable assigning an ôevenö number and she just thought it should be higher than 37%. Other students select the median or mode of the base landings for the class as their new estimate.

In past experiments, the empirical probability for a tossed plain KISS landing on its base has consistently been near 35%.

The important concept in this activity is that one cannot simply count outcomes for every object and assign a theoretical probability to any given outcome. Specifically, by exploring objects that cannot be assigned theoretical probabilities, students seem to have a greater understanding of when a theoretical probability can be assigned. Students can also appreciate the need for a way to obtain a good estimate of a probability in the absence of being able to assign a theoretical probability. This establishes a foundation for a discussion of empirical probabilities. Students recognize the need to generate data in order to estimate probabilities and discover some limitations of generating data. In this activity, we spilled ten candies at a time in order to expedite data collection. However, spilling this number of candies might yield different results than had the candies been spilled one at a time. Students will also note that results varied within and between groups, even though all candies appear identical and were spilled in the same manner. This point is addressed in Part II of this paper.

### 1.3 Activity 2

Background

In real life situations, experiments are conducted on samples from one population and results are often generalized to different populations. The second activity allows students to make predictions about the probability of an almond HERSHEYĺS KISS landing on its base when spilled from a cup, after having experimented with the plain KISSES. This activity works best if students have had previous experience with five-number summaries (minimum, first quartile, median, third quartile, and maximum), stemplots (or stem-and-leaf plots), and boxplots (or box-and-whisker plots).

Materials

Students continue working in groups of three. In addition to the materials used in the first experiment, each group should have ten almond HERSHEYĺS KISSES and each student should have two sticky notes (different colors) and a copy of the Activity 2 Worksheet (see Appendix A).

Procedure

Each student is asked to compare one plain and one almond HERSHEYĺS KISS and note any differences between them. Students usually notice that the wrapper colors differ, the base of the plain candy is smaller and has edges that are more tapered than the base of the almond candy, and the plain candy seems to be more slender than the almond candy. Of these differences, students should note that the wrapper color is not likely to affect the probability of the candy landing on its base, but the shape of the candy and the size of the base might. Further, students note that the almond candy contains pieces of almonds, but it is difficult to determine how this might affect the outcome.

Students are asked to estimate the probability that the almond KISS will land on its base when spilled in the same manner as the plain KISS (in Activity 1). Almost without exception, students estimate that the almond candy is more likely than the plain candy to land on its base. Estimates for the almond candy usually range from 40% to 80%. Students cite the base size and candy shape as reasons for their conjecture. Actually, this is very logical reasoning. Most adults come to the same conclusion after comparing the two candies. Try it yourself!

The students are now ready to test their hypothesis that the almond KISS will land on its base more often than the plain KISS. All twenty candies should be placed in the cup, the cup should be gently shaken twice, and the candies spilled onto the desktop. We stress that all twenty candies should be spilled at the same time so that each type of candy is tossed in the same manner. The student who spills the candy and a second student should count the number of plain and almond candies that land on their bases, and the third student should record the results for each type of candy. Be sure that students count carefully, because it is easy to overlook candies. After one student has spilled the candies ten times, the students should rotate responsibilities. When each group member has completed the experiment, each student should total the number of times (out of 100) each type of candy landed on its base.

Students are likely to encounter ômessy dataö when conducting this activity. Candies are apt to lean against one another and may not land fully on the table. When this happens we suggest that students move any of the candies that keep other candies from landing. Candy might also fall off of the table. We suggest that these be spilled again. These situations lead into an important discussion of data collection -- it is not always a nice, neat process. Sometimes, the investigator must decide how to handle data values that do not conform to expectations.

In Table 2 we have included an example of typical outcomes for the tosses of the KISSES.

Table 2. Example results for the KISSES tosses.

 TossNumber Number of Plain KISSESLanding on Base Number of Almond KISSESLanding on Base 1 2 3 4 5 6 7 8 9 10 7 3 2 4 8 4 4 5 3 3 4 5 4 3 5 4 2 3 2 1 Total 43 33

Class data should be recorded on the whiteboard. Students should write their totals on sticky notes (agree ahead of time which color represents each candy) and arrange them on the whiteboard, either as a back-to-back stemplot or as two frequency plots on the same scale. In Figure 1, we have included an example of a back-to-back stemplot for typical class results for the KISSES tosses.

```                    Plain       Almond
| 1 | 8 9
8 8 4 | 2 | 0 2 4 6 6 7 8 8 8
9 8 8 7 6 6 5 4 2 2 1 0 | 3 | 0 0 1 3 3 3 6
3 3 0 | 4 |
```

Figure 1. Stemplot for KISSES tosses.

Every time that we have conducted this experiment, the plain candies land on their base more often than the almond candies. Students get a feel for this counterintuitive result by looking at the class data and the focus now shifts to what questions can be answered by analyzing the class results. A great starting point for analyzing the two datasets is to have students work in their groups to find the five-number summary of the class data for each type of candy. The five-number summaries for the example class data are shown below in Table 3.

Table 3. Five-number summaries for class examples.

 Summary Plain Almond Minimum Quartile 1 Median Quartile 3 Maximum 24% 31% 35.5% 38% 43% 18% 24% 28% 31% 36%

Next, students are asked to construct boxplots. Within each type of candy, the boxplots can be used to make a class decision as to what should be the claimed percentage of base landings. Between each type of candy, the boxplots can be used to compare the percentages of base landings. Comparative boxplots for the example class data are displayed in Figure 2.

Figure 2. Boxplots for example class data.

For the plain KISSES the median percentage of base landings is 35.5% which agrees with the estimate of 35% obtained in Activity 1. For the almond KISSES the median percentage of base landings is considerably lower at 28%.

In past experiments, the empirical probability for a tossed almond KISS landing on its base has consistently been near 30%.

To compare the plain and almond, note that the third quartile for the almond base landings, 31%, is the same as the first quartile for the plain, indicating that 75% of the almond tosses resulted in no more than 31% landing base-up, while 75% of the plain tosses resulted in at least 31% landing base-up. The interquartile range (IQR), the difference between the first quartile and the third quartile, can be used to compare variability. Students get a visual feel for the IQR by noting that it is simply the length of the ôboxö in the boxplot. The IQR for both plain and almond is 7 percentage points.

As students construct the boxplots, they should assess both datasets for outliers. Using the conventional formula that regards outliers as those values lying more than 1.5 times the IQR below the first quartile or above the third quartile, we find that neither of our datasets has outlying values.

After completion of the construction and interpretation of the boxplots, a possible numerical extension is to have students calculate means and standard deviations for the percentage data. In this example, the means for plain and almond are 34.67% and 27.33%, respectively and the sample standard deviations for plain and almond are 5.26% and 5.17%, respectively.

Summary

We ask students to answer the following questions to help summarize their findings.

• Should everyone in the class have the same five-number summaries? Why or why not?

• How do the five-number summaries for the two datasets compare?

• Why is it important that the ten plain and ten almond candies be spilled at the same time?

• When you estimated the probability for the almond candies, you made a generalization based on what you knew about the plain candies. Why is it important to test generalizations?

• Analyze your boxplots and write at least two statements comparing the outcomes for the two different candies.

• Do you think the results of this experiment would change if we were to increase or decrease the number of tosses? Explain your reasoning.

### 1.4 Discussion

These activities are very well received by students and are a great way to bring life to a discussion of several topics in basic descriptive statistics. Perhaps the most remarkable aspect of these activities is the empirical result of a lower percentage of base landings for the almond candies, despite the fact that the bases of the almond candies are noticeably wider than the bases of the plain candies. The data collected on base landings for the almond candies is counterintuitive. This provides an excellent example of results that apply to one subgroup which should not be generalized or adjusted to make inferences about a similar, but different subgroup, even if intuition says that the subgroups are relatively the same except for some seemingly small characteristic. Making generalizations based on previous experiments is very common in real life. For example, pharmaceutical experiments may be performed on males and the results generalized to females, or tests are conducted on animals and results are generalized to human beings. Students have made similar (possibly incorrect) generalizations with their candies and see the importance of testing these generalizations. A good source for examples of experimental and study results that have been generalized is the Reuters Health Information directory at the Web site www.organoninc.com.

Another valuable aspect of these activities is that they provide students with a concrete example of a situation in which the assignment of a theoretical probability is not possible. A subjective probability can be assigned to the probability of a base landing for an almond KISS after having seen the results for the plain KISSES. We ask students to make this subjective assignment prior to performing the tosses for the almond candies. After realizing that their prior subjective probability assignment for the almond candies is too high (which typically is the case for most students), students can appreciate the need for a way to obtain a good estimate of the probability for the almond candies in the absence of being able to assign a theoretical probability. This lays the foundation for a discussion of empirical probabilities.

After completing these activities, additional questions are typically raised by students. For example, students might state that they think candies are more likely to land on their base if the cup is pulled across the table as candies are spilled (instead of spilling them all in one spot), or that the results might change if the candies are spilled from a higher distance or spilled onto carpeting rather than a hard desktop. To this, we suggest that students test their hypotheses!

## 2. Part II: What can be Inferred from a Kiss?

### 2.1 Introduction

According to the NCTM (2000), instructional programs should enable all students to develop and evaluate inferences that are based on data. In particular, students in grades 9 through 12 should be able to use simulations to explore the variability of sample statistics from a known population and to construct sampling distributions, understand how sample statistics reflect the values of population parameters, and use sampling distributions as the basis for informal inference. These topics are also salient in Advanced Placement (AP) Statistics. One of the four major themes of an AP Statistics course is statistical inference and the outline of major topics covered by the AP Statistics Examination includes sampling distributions. In particular, students are expected to have an understanding of the sampling distribution of a sample proportion and the simulation of sampling distributions.

In this section, we discuss an activity that extends the concepts developed in Part I in order to introduce sampling distributions for proportions. The activity is completed in two parts. The first part of the activity is completed interactively in the classroom. The second part of the activity is assigned as homework. The activity was designed for use in an introductory statistics course at the undergraduate level. However, with minor (or no) revision, it may be used in a grades 9 through 12 class or in an AP Statistics course. The estimated interactive completion time for the first part of this activity is one and a half one-hour class periods.

### 2.2 Activity 3

Background

In this activity, students explore the properties of the distribution of a sample proportion. Initially, there are two unknown proportions. These are the proportion of base landings for tossed plain and almond HERSHEYĺS KISSES.

Materials

Students work in groups of three. Each group has 30 plain HERSHEYĺS KISSES, 30 almond HERSHEYĺS KISSES, a 16-ounce plastic cup, a flat table or desktop on which to work, a copy of the Data Collection Sheets and the Activity 3 Worksheet (see Appendix B).

Procedure

Students should reflect on the results of their experiment in Part I of this paper. Specifically, they should note that, although each person tossed the candies the same number of times, there was a wide range of results. For example, for the plain candies, some students had as few as 24% landing base-up, while others had as high as 43%. Students should be asked to think about whether the percentages of base landings have a specified distribution and whether they think that the number of KISSES tossed affects the shape (mound-shaped for the base landings displayed in Figure 1) or the mean and standard deviation of this distribution. For example, for the plain KISSES, we saw that, for 100 candies tossed, the mean and standard deviation of the percentages of base landings were 34.67% and 5.26%, respectively. If only 20 candies were tossed, rather than 100, would the mean percentage still be near 35%? Would the standard deviation still be around 5%, or would it be lower or higher? What if 200 candies were tossed? How, if at all, would this change the results?

To start Activity 3, each group puts ten plain HERSHEYĺS KISSES into the plastic cup, and the following tasks are assigned to the group members.

• One group member spills the KISSES onto the table five times, each time counting the number of candies that land on their base.

• A second group member helps the spiller count the candies.

• The third group member records the results on the Data Collection Sheets.

After one student has spilled the contents of the cup five times, the assigned duties rotate among group members. After each group member has completed spilling the ten plain KISSES five times, the process is repeated for 20 and 30 plain candies in the cup.

After completion of tossing the plain candies, each student will have performed a total of fifteen tosses (five for each of the three sample sizes). In Table 4 we have included an example of typical individual outcomes for the tosses of the plain KISSES.

Table 4. Example individual results for the plain KISSES tosses.

 TossNumber Number of KISSESLanding on Base( n = 10 ) Number of KISSESLanding on Base( n = 20 ) Number of KISSESLanding on Base( n = 30 ) 1 2 3 4 5 7 3 2 4 8 7 10 5 5 11 10 12 11 13 15

Recall that in past experiments, the proportion of the time that a plain KISS lands on its base when tossed has consistently been near 35%.

Next, the tossing procedure is repeated for the almond candies. In Table 5, we have included an example of typical outcomes for the tosses of the almond KISSES.

Table 5. Example individual results for the almond KISSES tosses.

 TossNumber Number of KISSESLanding on Base( n = 10 ) Number of KISSESLanding on Base( n = 20 ) Number of KISSESLanding on Base( n = 30 ) 1 2 3 4 5 4 5 4 3 5 6 5 6 6 6 5 7 9 9 10

Recall that in past experiments, the proportion of the time that an almond KISS lands on its base when tossed has consistently been near 30%.

In order to complete the Data Collection Sheets, individual results within each group are combined to obtain 15 tosses (repetitions) for each of the sample sizes of n = 10, 20, and 30 for plain and almond KISSES. Next, individual results within each group are combined to obtain five tosses for sample sizes of n = 60 and n = 90 for each of the plain and the almond candies (by adding each group memberĺs results for tosses of 20 and 30 candies). Finally, each group is asked to merge their results with two other groups to obtain 15 tosses for the sample sizes of 60 and 90 for each of the two types of KISSES.

In Table 6, we have included an example of a typical group outcome for the tosses of the KISSES.

Table 6. Example group results for the KISSES tosses.

 TossNumber Number ofPlain KISSESLanding on Base( n = 10 ) Number ofPlain KISSESLanding on Base( n = 20 ) Number ofPlain KISSESLanding on Base( n = 30 ) Number ofAlmond KISSESLanding on Base( n = 10 ) Number ofAlmond KISSESLanding on Base( n = 20 ) Number ofAlmond KISSESLanding on Base( n = 30 ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 6 2 4 1 5 3 6 5 4 6 2 3 2 3 3 7 7 9 2 7 9 7 7 7 10 10 3 6 6 7 11 8 10 11 11 11 10 7 11 10 13 11 13 11 9 4 4 2 2 4 2 7 3 5 4 1 2 3 3 4 6 6 9 3 3 6 4 8 7 3 3 7 7 6 3 4 9 10 7 11 5 13 6 11 8 7 7 15 11 9

In Table 7, we have included an example of a typical three-group outcome for the tosses of the KISSES.

Table 7. Example of merged group results for the KISSES tosses.

 TossNumber Number ofPlain KISSESLanding on Base( n = 60 ) Number ofPlain KISSESLanding on Base( n = 90 ) Number ofAlmond KISSESLanding on Base( n = 60 ) Number ofAlmond KISSESLanding on Base( n = 90 ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 23 22 19 24 25 21 16 16 21 23 18 22 23 21 17 42 37 31 34 32 36 30 31 32 33 42 28 26 35 31 16 13 21 17 18 16 21 16 17 21 21 21 15 13 15 23 29 26 28 26 32 20 31 25 26 28 23 24 18 36

We treat the results of multiple tosses of the KISSES as independent trials. We make the assumption that the trials are roughly independent to expedite data collection.

The Sampling Distribution of the Proportion of Base Landings

After the groups have completed the Data Collection Sheets, work begins on the Activity 3 Worksheet.

As was noted earlier, previous experience with tossing the KISSES has shown that the proportion of base landings for the plain and almond candies are approximately p = .35 and p = .30, respectively.

For each of the plain and almond KISSES, students are asked to determine the proportion of base landings for the sample sizes of n = 10, 20, 30, 60, and 90. For each sample size, there will be 15 sample proportion values. Collecting the data provides an opportunity for students to see a concrete example of repeated sampling. Calculating the proportion of base landings for each sample reinforces the idea of a sample proportion being a random variable with values that change from sample to sample.

Once the sample proportions have been calculated, students are instructed to answer a series of questions based on calculating the means and the standard deviations of the sample proportions for the different sample sizes. By analyzing their calculations, students begin to discover properties of the distribution of a sample proportion. They will note that the distribution of sample proportion values is centered on the value of the population proportion. They will also note that the variability of the sample proportion values is related to the sample size, with a larger sample size resulting in smaller variability in the sample proportion values.

To complete the classroom work for the activity, we introduce simulation as an aid in further exploring the properties of the distribution of a sample proportion. This is done using a computer software package to generate sample proportion values for a wide range of sample sizes, population proportions, and an increase in the number of repetitions for the experiment.

Homework Assignment

The activity concludes with a homework assignment (see Appendix B) that requires students to perform simulations using different sample sizes, n, and different values for the population proportion, p. The homework assignment was designed to help illustrate that, for suitable values of n and p, the sample proportion has an approximately normal distribution. The completion of the homework assignment also reinforces the properties of the sampling distribution students have seen in class and motivates the formula for the variance of the distribution of a sample proportion.

Students are first asked to use simulation to obtain values of sample proportions of base landings for 100 different samples of sizes n = 5, 15, 40, 80 and 120, for a fixed population proportion of p = .30. After obtaining the simulated sample proportion values, students are asked to construct a histogram of the values for each sample size. Through examining their histograms, students can visualize the relationship between sample size and the shape of the distribution of a sample proportion. Students can see that the sampling distribution will be approximately a normal distribution if the sample size is large. In addition, the graphs further reinforce the properties of the sampling distribution being centered on the value of the population proportion and a decrease in variability with an increase in sample size.

Next, students are asked to use simulation to obtain values of sample proportions of base landings for 15 different samples of size n = 100 for population proportions of p = 0.10, 0.25, 0.50, 0.75, and 0.90. For each of the values of p, students are asked to calculate the variance of the generated sample proportion values. Then, students are asked to construct a scatterplot of the variances (vertical axis) versus the corresponding proportions (horizontal axis). The scatterplot will reveal a quadratic relationship between the variance of the sampling distribution of the sample proportion and the value of the population proportion. In order to help students gain some insight into the formula for the variance of the distribution of a sample proportion, after students have completed the homework assignment, we suggest combining

1. a discussion of the intuition behind the variability of the sampling distribution being the lowest for small and large values of the population proportion and the highest for values of the population proportion near 50%, with

2. a reminder of the formula for a quadratic equation.

Students are next given fifteen simulated sample proportion values for samples of size 100 from a population with a proportion of 50% and fifteen simulated sample proportion values for samples of size 100 from a population with a proportion of 4%. For each of these cases, students are asked to calculate empirical percentiles for the sample proportions and to plot the empirical percentiles versus the appropriate normal distribution percentiles. In constructing these plots, students will note that in the first case, the plot appears to be linear, indicating that the sampling distribution of the sample proportion is normal and in the second case, the plot does not appear to be linear, indicating that the sampling distribution of the sample proportion is not normal. Finally, to further investigate the relationship between the value of p and normality of the sampling distribution, we ask students to generate the values of sample proportions for 100 different samples of size n = 100, where the true proportion is assumed to be p = .01. A histogram of the 100 simulated sample proportions must then be constructed and students are asked to explain why the distribution looks positively skewed (right skewed) and not normal. We refer to these cases later when we discuss the values of n and p that are required to assure normality of the distribution of the sample proportion.

### 2.3 Discussion

The HERSHEYĺS KISS toss data provides a fun and interesting way to introduce the concept of the sampling distribution of a sample proportion. Active data collection helps reinforce the concept of repeated sampling (with different samples producing different results). Proportions of base landings can be calculated for different numbers of tosses. Changing the sample size allows students to formulate ideas about the mean and the variability of the distribution of a sample proportion and to determine the relationship between sample size and variability. Using a computer software package to simulate sample proportions produces enough data to construct graphical displays and allows for a visualization of the shape of the distribution of the sample proportions for different sample sizes.

The instructor can refer to this activity when discussing the distribution of a sample proportion from a theoretical perspective. Students have examined empirical properties of the sampling distribution and will be ready to advance to a discussion of the theoretical properties.

An extension of this activity is to use the established concepts (the idea of repeated sampling, sampling variability, and sampling distributions) to help introduce the distribution of a sample mean and the Central Limit Theorem. Students could complete a simulation exercise for sample means similar to the exercise for sample proportions. Students could be asked a series of questions based on the means and standard deviations of sample means generated for different sample sizes and from populations with differing means. Properties of the distribution of a sample mean would be discovered. Students would discover properties of the center, spread, and shape of the sampling distribution of the sample mean.

## Appendix A: Part I Worksheets

The worksheets are stored as Adobe PDF documents. Click on the title of the worksheet to view it.

## Appendix B: Part II Worksheets

The worksheets are stored as Adobe PDF documents. Click on the title of the worksheet to view it.

## Acknowledgments

The authors gratefully acknowledge the contribution of Byron Gajewski in the writing of questions posed of students in Activity 3, as well as the helpful comments and suggestions of the editor and the reviewers during the preparation of this manuscript.

The HERSHEYĺS and KISSES trademarks are used with permission of Hershey Foods Corporation.

## References

National Council of Teachers of Mathematics (1989), Curriculum and Evaluation Standards for School Mathematics, Reston, VA: Author.

National Council of Teachers of Mathematics (2000), Principles and Standards for School Mathematics, Reston, VA: Author.

Shaughnessy, J. M. and Bergman, B. (1993), ôThinking About Uncertainty: Probability and Statistics,ö in Research Ideas for the Classroom: High School Mathematics, ed. P. S. Wilson, New York, NY: MacMillan, 177-197.

Mary Richardson
Department of Statistics
Grand Valley State University
Allendale, MI 49401
USA
richamar@gvsu.edu

Susan Haller
Department of Mathematics
Saint Cloud State University
St. Cloud, MN 56301
USA
skhaller@stcloudstate.edu