Margaret H. Smith

Pomona College

Journal of Statistics Education Volume 12, Number 2 (2004), jse.amstat.org/v12n2/smith.html

Copyright © 2004 by Margaret H. Smith, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.

**Key Words:** Sample size; Sampling distribution; Standard error.

I have tried many ways of explaining that it is the absolute size of the sample, not the size of the sample relative to the population, that matters for our confidence in an estimate. For example, I point out that we could surely get a pretty accurate estimate of the probability of heads from a few thousand coin flips, even though the population is theoretically infinite. Whether judged by their skeptical looks or their answers to test questions, many students remain unconvinced by these arguments. I now begin this discussion not with abstract arguments, but with a hands-on activity that lets students discover the principle for themselves.

Students are told to fill in the 12 cells labeled A-L in the table in Appendix I and to answer the remaining questions with the understanding that they will be allowed to revise their answers after the classroom discussion. A very nice feature of this exercise is that individual students cannot see the overall answer in their results; it is only when students compile their collective data that the conclusions emerge. It is also nice that the parallel data collection by individual students drives home the elusive point that any random sample is but one of many samples that might have been obtained. And the conclusion is made more convincing by the fact that the students themselves gathered the data, rather than having the professor present results that might have been chosen selectively. It is much more powerful when an audience member reaches into his own hat and finds a rabbit.

When the students come to class with their results, I put the table in Appendix II on the board and have all the students come to the board and put vertical marks in the appropriate spaces—thereby creating 12 bar graphs. As a class, we sit and stare at these graphs for a few minutes and jot down one or two observations. Then we have a discussion for 30 minutes and develop the idea that it is n, the absolute size of the sample, and not n/N, the size of the sample relative to the population, that determines the dispersion of the sample results.

I have found it useful to have students compare the graphs in three different ways. First, I have them increase the sample as a fraction of the population, n/N, from .10 to .25 to 1.00 (graphs A, B, C; E, F, G; or J, K, L). They all see that the sampling distribution becomes more concentrated around p as n/N rises. I want them to make these comparisons first because it is what they intuitively expect to see.

Second, I have them compare the sampling distributions for different values of n, holding n/N constant. First we compare n = 40, N = 400 (graph A); n = 100, N = 1000 (graph E); and n = 1000, N = 10,000 (graph J). Even though each sample is 10% of its population, the spread of these three sampling distributions varies quite a bit depending on the sample size, n. Then they compare the sampling distributions for the 25% samples (graphs B, F, K) and notice again that the accuracy of the sample does not seem to depend on n/N but rather on n. In each case, the sampling distribution becomes more concentrated around p when n increases.

Finally, I have them compare the sampling distributions for a given sample size n , varying the size of the population N. First we look at n = 40, with the population size varying from 400 to 1000 to 10000 (graphs A, D, and H). They see that the sampling distributions are similarly shaped, even though n/N is varying from 10% to 4% to 0.4%. This observation—that it is n, rather than n/N, that determines the shape of the sampling distribution—is confirmed by looking at sample size n = 100 for population sizes ranging from 400 to 1000 to 10,000 (graphs B, E, I).

In this way, I try to convince them that n is what drives our confidence in sample estimates, because the distribution of p-hat narrows with n, not n/N. It helps to concede to them that if n/N is very large, of course the p-hat distribution will narrow considerably, which they all seem to understand intuitively (especially when they think about n = N!). This discussion gives student a richer appreciation of sampling distributions and the importance of n.

In addition to addressing the central limit theorem, this exercise gives students an opportunity to work with standardized variables.

Such a mistake provides a wonderful opportunity to introduce hypothesis testing. As the students look at the results on the board, someone may remark how different one student’s result is from everyone else’s. If not, I make that observation. Either way, there is some laughter, attempts to identify the student with the odd result, and some nervous deer-in-the-headlights looks from some students.

I reassure everyone that no dire consequences are imminent, and then ask them to consider the plausibility of the odd result: “Maybe they just had an unlucky sample? Maybe, by the luck of the draw, they selected too many males? Or maybe there was a mistake in using the software or recording their results?” After a brief pause, I ask the punch line, “How could we draw a reasonable conclusion about whether this sample with only 35% successes was (a) an unlucky draw from a population with a 0.40 probability of a success; or (b) something else (like an inadvertent draw from a different population or a clerical error)?

If no one recommends the hoped-for procedure, I suggest it: “If this was a random sample of size 1000 from an infinite population with a 0.40 success probability, what would be the probability of so few successes?”

Using the binomial distribution, the exact probability of so few successes (fewer than 351) is 0.000645. The two-sided p-value is the probability that the number of successes would be so far from 40% (35% or fewer or 45% or more) is 2(0.000645) = 0.00129. We can also use a normal approximation with or without a continuity correction to estimate this probability.

Because the answers are categorical, the chi-square statistic can be used to assess whether the differences between the
before and after responses are statistically persuasive. The exact p value for the chi-square statistic can be computed
using the multivariate hypergeometric distribution (Agresti and Wackerly 1977).
For Question 1, the exact p value is 4.3 x 10^{-7} for Question 2, the exact p value is 4.6 x 10^{-14}.

(HINT: Use SSP. First type in the variable names A-L at the top of the first 12 columns (in place of var1, var2, etc.) Then choose Uncertainty on the tool bar and choose the Random Numbers option. Choose “Draw a random sample without replacement...” Enter the population size in the first box and the sample size in the second box and choose to put your random sample in the spreadsheet. Finally, select the appropriate variable (A-L) for saving the data. The Numerical Order option on the screen will allow you to quickly determine how many females are in your sample. Repeat the process for all 12 samples.)

population | N = 400 | N = 1000 | N = 10,000 | |||
---|---|---|---|---|---|---|

Sample n | n/N | % female | n/N | % female | n/N | % female |

40 | 40/400 = 0.10 | A | 40/1000 = 0.04 | D | 40/10000 = 0.004 | H |

100 | 100/400 = 0.25 | B | 100/1000 = 0.10 | E | 100/10000 = 0.01 | I |

250 | 250/1000 = 0.25 | F | ||||

400 | 400/400 = 1.00 | C | ||||

1000 | 1000/1000 = 1.00 | G | 1000/10000 = 0.10 | J | ||

2500 | 2500/10000 = 0.25 | K | ||||

10000 | 10000/10000 = 1.00 | L |

Think about the following questions. Answer them the best you can by anticipating what you will see in class. And then
__modify__ your answers after you see the results for the whole class.

- Compare your results to others in the class. For which cells are the results similar and for which cells are they not?
Explain.
- What do you notice as you compare the distributions of A to B to C or the distributions of E to F to G or the
distributions of J to K to L? Explain what is going on as you move from low n/N samples to samples with n/N = 1?
- What do you notice as you compare these 10% samples: A, E, and J?
- What do you notice as you compare these 25% samples: B, F, and K?
- What do you notice as you compare these 3 distributions with sample size n = 40: A, D, and H?
- What do you notice as you compare these 3 distributions with sample size n = 100: B, E, and I?
- What happens to the concentration of the sampling distribution as n rises, holding n/N fixed (see questions c and d)?
- What happens to the concentration of the sampling distribution as n/N rises, holding n fixed (see questions e and f)?
- In general, for the more typical situation where the population is very large and the sample is less than a quarter of
the population, does the accuracy of the point estimate depend on how big the sample is as a fraction of the population or
on the absolute size of the sample?
- Based on what you learned from the exercise above, is this statement true or false?: “A 10% sample should be sufficient in all situations. For a population of size 400, a survey with 47 people should be more than enough to get a sample estimate that is representative of the population.”

population | N = 400 | N = 1000 | N = 10,000 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

sample n | 40 | 100 | 400 | 40 | 100 | 250 | 1000 | 40 | 100 | 1000 | 2500 | 10000 |

n/N | .10 | .25 | 1.00 | .04 | .10 | .25 | 1.00 | .004 | .01 | .10 | .25 | 1.00 |

A | B | C | D | E | F | G | H | I | J | K | L | |

>.125-.175 | ||||||||||||

>.175-.225 | ||||||||||||

>.225-.275 | ||||||||||||

>.275-.325 | ||||||||||||

>.325-.375 | ||||||||||||

>.375-.425 | ||||||||||||

>.425-.475 | ||||||||||||

>.475-.525 | ||||||||||||

>.525-.575 | ||||||||||||

>.575-.625 | ||||||||||||

>.625-.675 |

Two Observations:

Takeaways:

- Draw a bar graph for p-hat with p-hat on the x-axis and %frequency on the y-axis.
- Explain how this bar graph compares to the theoretical binomial probability distribution of X/n when p = .4 and n = 1000.
- Explain why p-hat can be considered a random variable. What kind of random variable is this sample estimate? A mean or
a proportion?
- Explain why this histogram is expected to have a normal distribution using the concept of the Central Limit Theorem.
- Calculate the mean and standard deviation of the random variable, p-hat.
- Standardize the random variable, p-hat, to have a mean of zero and a standard deviation of one. (Subtract the mean of
the p-hat values from each p-hat, and divide each difference by the standard deviation of the distribution of p-hat.)
Explain how to interpret the standardized values.
- Draw the bar graph for Z, the standardized p-hat, from part (f), with Z on the x-axis and %frequency on the y-axis. How
does this graph compare to the graph in part (a) above? How is it similar and how is it different?
- What fraction of the values of p-hat is within one standard deviation of the mean of p-hat? What fraction is within two
standard deviations of the mean of p-hat?
- Are your two histograms from (a) and (g) perfectly normally distributed? How can you use the information from (h) to
tell?
- Why are they not perfectly normal?
- Summarize the main thing you learned from this exercise.

- Plot the data in a bar graph where x/n is on the x-axis and % frequency is on the y-axis. What do you expect the shape
to be? Draw a diagram of what you expect, labeling the axes appropriately. Explain why you expect this shape. Does the
actual distribution of x/n look exactly like what you expected? How similar or different is it to what you expected?
- What is the mean of your distribution of x/n? Label both the empirical mean (the average of the x/n values) and
theoretical mean (p) on your graph (they might be quite close).
- What is the theoretical standard deviation of your histogram (for p = 0.40 and n = 1000)? Label ±1 theoretical SD and
±2 theoretical SD from the theoretical mean on your histogram.
- What fraction of your 500 p-hat or x/n values actually falls within 2 theoretical standard deviations of the true p?
- Now calculate a 95% confidence interval (lower and upper bounds) for each of your 500 p-hat estimates of p.
- How many of your 500 confidence intervals actually capture the true p of 40%? Is this what you expected? Explain
carefully.
- Draw a line under the histogram demarcating the position of the point estimate and width of its associated confidence
interval for each of the confidence intervals that does
__not__capture p = 40%. What do you notice about the distance of those point estimates from the p = 40%? - Interpret the meaning of a 95% confidence interval for any one particular point estimate, based on what you’ve learned
above.
- In practice, people do not know the true population mean and do not have several sample estimates of p; rather they have only one sample estimate of p. Relate this exercise to a real world situation where The Wall Street Journal might take one political poll of 1000 people to anticipate the result of the next presidential election. What would be the margin of error for this poll (for a 95% confidence interval)? Explain how you would interpret the result of this political poll now that you understand about sampling distributions. How confident would you be in the results of this poll? Explain.

- “You need to obtain a sample that is at least 10% of the population in order to get a reliable estimate of the
population parameter.”
- I do not believe this
- I am skeptical
- Neutral/not sure
- This seems right
- I believe this strongly

Table 1: Distribution of Responses to Question 1 Before After n % n % I do not believe this 3 10.0 22 73.3 I am skeptical 4 13.3 2 6.7 Neutral/not sure 7 23.3 3 10.0 This seems right 16 53.3 2 6.7 I believe this strongly 0 0.0 1 3.3 total 30 100.0 30 100.0 - I do not believe this
- “For large population sizes, the size of the population is irrelevant to the reliability of the sample estimate;
what matters is the absolute size of the sample.”
- I do not believe this
- I am skeptical
- Neutral/not sure
- This seems right
- I believe this strongly

- I do not believe this

Before | After | |||
---|---|---|---|---|

n | % | n | % | |

I do not believe this | 11 | 36.7 | 1 | 3.3 |

I am skeptical | 9 | 30.0 | 0 | 0.0 |

Neutral/not sure | 6 | 20.0 | 0 | 0.0 |

This seems right | 4 | 13.3 | 3 | 10.0 |

I believe this strongly | 0 | 0.0 | 26 | 86.7 |

total | 30 | 100.0 | 30 | 100.0 |

Margaret H. Smith

Department of Economics

Pomona College

Claremont, CA 91711

U. S. A.
*msmith@pomona.edu*

Volume 12 (2004) | Archive | Index | Data Archive | Information Service | Editorial Board | Guidelines for Authors | Guidelines for Data Contributors | Home Page | Contact JSE | ASA Publications