A Sample/Population Size Activity:

Is it the sample size of the sample as a fraction of the population that matters?

Margaret H. Smith
Pomona College

Journal of Statistics Education Volume 12, Number 2 (2004), jse.amstat.org/v12n2/smith.html

Copyright © 2004 by Margaret H. Smith, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.

Key Words: Sample size; Sampling distribution; Standard error.

Abstract

Unless the sample encompasses a substantial portion of the population, the standard error of an estimator depends on the size of the sample, but not the size of the population. This is a crucial statistical insight that students find very counterintuitive. After trying several ways of convincing students of the validity of this principle, I have finally found a simple memorable activity that convinces students beyond a reasonable doubt. As a bonus, the data generated by this activity can be used to illustrate the central limit theorem, confidence intervals, and hypothesis testing.

1. The Problem

I have often led students in my introductory, college-level statistics classes through a discussion of how to conduct a poll of student opinion on some current controversial topic. The population might be the students at a large university, the students residing in a given state, or all students in the United States. At some point, I ask them how many students they would survey. Most base their answer on the size of the population; for example, “We should survey 10 percent of the students.” This reasoning also appears if I initially focus the discussion on a single university and then ask how many students should be surveyed if our population consisted of all the students in the state. Almost all students increase the sample size, because they believe that a larger population requires a larger sample. The only naysayers are persons familiar with actual public opinion polls who point out that only a few thousand voters are surveyed in presidential election polls.

I have tried many ways of explaining that it is the absolute size of the sample, not the size of the sample relative to the population, that matters for our confidence in an estimate. For example, I point out that we could surely get a pretty accurate estimate of the probability of heads from a few thousand coin flips, even though the population is theoretically infinite. Whether judged by their skeptical looks or their answers to test questions, many students remain unconvinced by these arguments. I now begin this discussion not with abstract arguments, but with a hands-on activity that lets students discover the principle for themselves.

2. The Activity

Early in the semester students do the homework exercise described in Appendix I using Smith’s Statistical Package (SSP) http://www.economics.pomona.edu/StatSite/framepg.html, a free program that they can use with virtually no instruction; the exercise can also be done with most commercial statistics software packages, a spreadsheet program like Excel, or pocket calculators like the TI-83. The exercise can easily be modified to reflect a public opinion poll or some other design that appeals to students.

Students are told to fill in the 12 cells labeled A-L in the table in Appendix I and to answer the remaining questions with the understanding that they will be allowed to revise their answers after the classroom discussion. A very nice feature of this exercise is that individual students cannot see the overall answer in their results; it is only when students compile their collective data that the conclusions emerge. It is also nice that the parallel data collection by individual students drives home the elusive point that any random sample is but one of many samples that might have been obtained. And the conclusion is made more convincing by the fact that the students themselves gathered the data, rather than having the professor present results that might have been chosen selectively. It is much more powerful when an audience member reaches into his own hat and finds a rabbit.

When the students come to class with their results, I put the table in Appendix II on the board and have all the students come to the board and put vertical marks in the appropriate spaces—thereby creating 12 bar graphs. As a class, we sit and stare at these graphs for a few minutes and jot down one or two observations. Then we have a discussion for 30 minutes and develop the idea that it is n, the absolute size of the sample, and not n/N, the size of the sample relative to the population, that determines the dispersion of the sample results.

I have found it useful to have students compare the graphs in three different ways. First, I have them increase the sample as a fraction of the population, n/N, from .10 to .25 to 1.00 (graphs A, B, C; E, F, G; or J, K, L). They all see that the sampling distribution becomes more concentrated around p as n/N rises. I want them to make these comparisons first because it is what they intuitively expect to see.

Second, I have them compare the sampling distributions for different values of n, holding n/N constant. First we compare n = 40, N = 400 (graph A); n = 100, N = 1000 (graph E); and n = 1000, N = 10,000 (graph J). Even though each sample is 10% of its population, the spread of these three sampling distributions varies quite a bit depending on the sample size, n. Then they compare the sampling distributions for the 25% samples (graphs B, F, K) and notice again that the accuracy of the sample does not seem to depend on n/N but rather on n. In each case, the sampling distribution becomes more concentrated around p when n increases.

Finally, I have them compare the sampling distributions for a given sample size n , varying the size of the population N. First we look at n = 40, with the population size varying from 400 to 1000 to 10000 (graphs A, D, and H). They see that the sampling distributions are similarly shaped, even though n/N is varying from 10% to 4% to 0.4%. This observation—that it is n, rather than n/N, that determines the shape of the sampling distribution—is confirmed by looking at sample size n = 100 for population sizes ranging from 400 to 1000 to 10,000 (graphs B, E, I).

In this way, I try to convince them that n is what drives our confidence in sample estimates, because the distribution of p-hat narrows with n, not n/N. It helps to concede to them that if n/N is very large, of course the p-hat distribution will narrow considerably, which they all seem to understand intuitively (especially when they think about n = N!). This discussion gives student a richer appreciation of sampling distributions and the importance of n.

3. Reusing the Data

Another nice feature of this activity is that the data can be recycled to illustrate other points throughout the course, such as the central limit theorem, confidence intervals, and hypothesis testing. Not only do all the virtues described above reapply, but there is a very nice continuity to the discussion.

4. Central Limit Theorem

A rough bell shape can usually be seen in the student data. I tell the students that I have supplemented their data for n = 1000, N = 10,000 by drawing another 500 samples and collected the results in a data file that they can download from the course web site in order to do the homework assignment shown in Appendix III. (I use 500, instead of 1000, samples because the students can too easily become confused between the 1000 samples and the sample size of 1000). We can imagine that these additional data are what the results might look like if five hundred other students did the same activity. (After this activity has been done enough times, I will actually be able to combine the student data for many classes over the years.)

In addition to addressing the central limit theorem, this exercise gives students an opportunity to work with standardized variables.

5. Confidence Interval

The dispersion of the student results for each combination of n and N nicely illustrates two central principles of sampling. First, the sample results depend on the specific random sample chosen and therefore are subject to sampling error. Second, the fact that there is a distribution of the sample results allows us to draw inferences from a single sample. In particular, the sample proportion is probably close to the population success probability. This observation sets up my promise that we will later see exactly how to quantify that probability—both theoretically and by using a lot more random samples than are currently on the board. Appendix IV shows a homework assignment that brings out these details.

6. Hypothesis Tests

Often, one or two students will report back a sample result that is plainly implausible. For example, that a sample of 1000 from a population of 10,000 with a success probability of 0.40 had a sample success frequency of 0.35. Presumably, the students made a mistake using the software or recording their results.

Such a mistake provides a wonderful opportunity to introduce hypothesis testing. As the students look at the results on the board, someone may remark how different one student’s result is from everyone else’s. If not, I make that observation. Either way, there is some laughter, attempts to identify the student with the odd result, and some nervous deer-in-the-headlights looks from some students.

I reassure everyone that no dire consequences are imminent, and then ask them to consider the plausibility of the odd result: “Maybe they just had an unlucky sample? Maybe, by the luck of the draw, they selected too many males? Or maybe there was a mistake in using the software or recording their results?” After a brief pause, I ask the punch line, “How could we draw a reasonable conclusion about whether this sample with only 35% successes was (a) an unlucky draw from a population with a 0.40 probability of a success; or (b) something else (like an inadvertent draw from a different population or a clerical error)?

If no one recommends the hoped-for procedure, I suggest it: “If this was a random sample of size 1000 from an infinite population with a 0.40 success probability, what would be the probability of so few successes?”

Using the binomial distribution, the exact probability of so few successes (fewer than 351) is 0.000645. The two-sided p-value is the probability that the number of successes would be so far from 40% (35% or fewer or 45% or more) is 2(0.000645) = 0.00129. We can also use a normal approximation with or without a continuity correction to estimate this probability.

7. Does It Work?

To test the effectiveness of this activity, I asked the 30 students in my Fall 2003 introductory statistics class the two questions in Appendix V at the beginning of the semester and after doing the activity. The first statement is intended to represent the intuition students bring to a statistics class; the second statement is the intuition I would like them to have after taking the class. The distribution of before and after answers are shown in Tables 1 and 2. At the start of the semester, 53% of the students answered “This seems right” to Question 1 and 67% answered “I do not believe this” or “I am skeptical” to Question 2. After the activity, 73% answered “I do not believe this” to Question 1 and 87% answered “I believe this strongly” to Question 2.

Because the answers are categorical, the chi-square statistic can be used to assess whether the differences between the before and after responses are statistically persuasive. The exact p value for the chi-square statistic can be computed using the multivariate hypergeometric distribution (Agresti and Wackerly 1977). For Question 1, the exact p value is 4.3 x 10^-7 for Question 2, the exact p value is 4.6 x 10^-14.

Acknowledgements

The author would like to thank the editor and referees for their very helpful comments and suggestions.

Appendix I: Sample Size Assignment

How do samples of different sizes perform in terms of giving an accurate representation of the underlying population? Does it depend on how large the sample is relative to the size of the population or does it depend on the absolute size of the sample? Let’s see. We will look at three populations that are 40% female and 60% male. First let’s examine a population of size 400 (N = 400) and assume the students numbered 1-160 are female and the students numbered 161-400 are male. Second, assume a population size 1,000 (N = 1,000) and assume the students numbered 1-400 are female and the students numbered 401-1000 are male. Third, examine a population of size 10,000 (N = 10,000) and assume the students numbered 1-4000 are female and the students numbered 4001-10,000 are male. For each of these three populations, record the fraction of females you get in the samples of various sizes in the cells labeled A-L below. (Do not fill in any cells other than A-L.)

(HINT: Use SSP. First type in the variable names A-L at the top of the first 12 columns (in place of var1, var2, etc.) Then choose Uncertainty on the tool bar and choose the Random Numbers option. Choose “Draw a random sample without replacement...” Enter the population size in the first box and the sample size in the second box and choose to put your random sample in the spreadsheet. Finally, select the appropriate variable (A-L) for saving the data. The Numerical Order option on the screen will allow you to quickly determine how many females are in your sample. Repeat the process for all 12 samples.)

population	N = 400		N = 1000		N = 10,000
Sample n	n/N	% female	n/N	% female	n/N	% female
40	40/400 = 0.10	A	40/1000 = 0.04	D	40/10000 = 0.004	H
100	100/400 = 0.25	B	100/1000 = 0.10	E	100/10000 = 0.01	I
250			250/1000 = 0.25	F
400	400/400 = 1.00	C
1000			1000/1000 = 1.00	G	1000/10000 = 0.10	J
2500					2500/10000 = 0.25	K
10000					10000/10000 = 1.00	L

Think about the following questions. Answer them the best you can by anticipating what you will see in class. And then modify your answers after you see the results for the whole class.

Compare your results to others in the class. For which cells are the results similar and for which cells are they not? Explain.
What do you notice as you compare the distributions of A to B to C or the distributions of E to F to G or the distributions of J to K to L? Explain what is going on as you move from low n/N samples to samples with n/N = 1?
What do you notice as you compare these 10% samples: A, E, and J?
What do you notice as you compare these 25% samples: B, F, and K?
What do you notice as you compare these 3 distributions with sample size n = 40: A, D, and H?
What do you notice as you compare these 3 distributions with sample size n = 100: B, E, and I?
What happens to the concentration of the sampling distribution as n rises, holding n/N fixed (see questions c and d)?
What happens to the concentration of the sampling distribution as n/N rises, holding n fixed (see questions e and f)?
In general, for the more typical situation where the population is very large and the sample is less than a quarter of the population, does the accuracy of the point estimate depend on how big the sample is as a fraction of the population or on the absolute size of the sample?
Based on what you learned from the exercise above, is this statement true or false?: “A 10% sample should be sufficient in all situations. For a population of size 400, a survey with 47 people should be more than enough to get a sample estimate that is representative of the population.”

Appendix II: Classroom Table of Results

population	N = 400			N = 1000				N = 10,000
sample n	40	100	400	40	100	250	1000	40	100	1000	2500	10000
n/N	.10	.25	1.00	.04	.10	.25	1.00	.004	.01	.10	.25	1.00
	A	B	C	D	E	F	G	H	I	J	K	L
>.125-.175
>.175-.225
>.225-.275
>.275-.325
>.325-.375
>.375-.425
>.425-.475
>.475-.525
>.525-.575
>.575-.625
>.625-.675

Two Observations:

Takeaways:

Appendix III: Central Limit Theorem Assignment

Discuss the sample estimates of percentage female for sample size, n = 1000 and population size N = 10,000 when p = .40, using data in HW6data.xls. (These data are exactly like the data you collected in HW5.

Draw a bar graph for p-hat with p-hat on the x-axis and %frequency on the y-axis.
Explain how this bar graph compares to the theoretical binomial probability distribution of X/n when p = .4 and n = 1000.
Explain why p-hat can be considered a random variable. What kind of random variable is this sample estimate? A mean or a proportion?
Explain why this histogram is expected to have a normal distribution using the concept of the Central Limit Theorem.
Calculate the mean and standard deviation of the random variable, p-hat.
Standardize the random variable, p-hat, to have a mean of zero and a standard deviation of one. (Subtract the mean of the p-hat values from each p-hat, and divide each difference by the standard deviation of the distribution of p-hat.) Explain how to interpret the standardized values.
Draw the bar graph for Z, the standardized p-hat, from part (f), with Z on the x-axis and %frequency on the y-axis. How does this graph compare to the graph in part (a) above? How is it similar and how is it different?
What fraction of the values of p-hat is within one standard deviation of the mean of p-hat? What fraction is within two standard deviations of the mean of p-hat?
Are your two histograms from (a) and (g) perfectly normally distributed? How can you use the information from (h) to tell?
Why are they not perfectly normal?
Summarize the main thing you learned from this exercise.

Appendix IV: Confidence Interval Assignment

Suppose that the p-hat data in HW6data.xls are several (500 to be precise) random samples of size 1000 from a population of 10,000 where p = .4 (40%) of all voters prefer candidate A.

Plot the data in a bar graph where x/n is on the x-axis and % frequency is on the y-axis. What do you expect the shape to be? Draw a diagram of what you expect, labeling the axes appropriately. Explain why you expect this shape. Does the actual distribution of x/n look exactly like what you expected? How similar or different is it to what you expected?
What is the mean of your distribution of x/n? Label both the empirical mean (the average of the x/n values) and theoretical mean (p) on your graph (they might be quite close).
What is the theoretical standard deviation of your histogram (for p = 0.40 and n = 1000)? Label ±1 theoretical SD and ±2 theoretical SD from the theoretical mean on your histogram.
What fraction of your 500 p-hat or x/n values actually falls within 2 theoretical standard deviations of the true p?
Now calculate a 95% confidence interval (lower and upper bounds) for each of your 500 p-hat estimates of p.
How many of your 500 confidence intervals actually capture the true p of 40%? Is this what you expected? Explain carefully.
Draw a line under the histogram demarcating the position of the point estimate and width of its associated confidence interval for each of the confidence intervals that does not capture p = 40%. What do you notice about the distance of those point estimates from the p = 40%?
Interpret the meaning of a 95% confidence interval for any one particular point estimate, based on what you’ve learned above.
In practice, people do not know the true population mean and do not have several sample estimates of p; rather they have only one sample estimate of p. Relate this exercise to a real world situation where The Wall Street Journal might take one political poll of 1000 people to anticipate the result of the next presidential election. What would be the margin of error for this poll (for a 95% confidence interval)? Explain how you would interpret the result of this political poll now that you understand about sampling distributions. How confident would you be in the results of this poll? Explain.

Appendix V: Student Questionnaire

For each of these two statements, please underline the response that most closely represents your viewpoint:

“You need to obtain a sample that is at least 10% of the population in order to get a reliable estimate of the population parameter.”

I do not believe this
I am skeptical
Neutral/not sure
This seems right
I believe this strongly

Table 1: Distribution of Responses to Question 1

	Before		After
	n	%	n	%
I do not believe this	3	10.0	22	73.3
I am skeptical	4	13.3	2	6.7
Neutral/not sure	7	23.3	3	10.0
This seems right	16	53.3	2	6.7
I believe this strongly	0	0.0	1	3.3
total	30	100.0	30	100.0

“For large population sizes, the size of the population is irrelevant to the reliability of the sample estimate; what matters is the absolute size of the sample.”
1. I do not believe this
2. I am skeptical
3. Neutral/not sure
4. This seems right
5. I believe this strongly

Table 2 Distribution of Responses to Question 2

	Before		After
	n	%	n	%
I do not believe this	11	36.7	1	3.3
I am skeptical	9	30.0	0	0.0
Neutral/not sure	6	20.0	0	0.0
This seems right	4	13.3	3	10.0
I believe this strongly	0	0.0	26	86.7
total	30	100.0	30	100.0

References

Agresti, A., and Wackerly, D. (1977), "Some exact conditional tests of independence for r x c cross-classification tables," Psyconmetricka, 42, 111-125.

Margaret H. Smith
Department of Economics
Pomona College
Claremont, CA 91711
U. S. A.
msmith@pomona.edu