An On-Line Workshop Using a Simple Capture-Recapture Experiment to Illustrate the Concepts of a Sampling Distribution

Carl James Schwarz and Jason Sutherland
Simon Fraser University

Journal of Statistics Education v.5, n.1 (1997)

Copyright (c) 1997 by Carl James Schwarz and Jason Sutherland, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.

Key Words: Teaching statistics; Statistics education; World Wide Web material; XLISP-STAT.

Abstract

We describe a World Wide Web-accessible workshop designed for students in an introductory statistics course that uses a capture-recapture experiment to illustrate the concept of a sampling distribution. In addition to the usual "sampling bowl" experiment, the workshop contains a computer simulation program written in XLISP-STAT that will allow students to further investigate the properties of the estimator.

1. Introduction

1 There appears to be a growing consensus that hands-on, activity-based learning is a good way to teach students the important concepts in an introductory statistics course. This paper describes a workshop for students on the use of capture-recapture methods to estimate the size of a fish population. The use of capture-recapture experiments to illustrate the concepts of a sampling distribution is not new (e.g., Scheaffer, Gnanadesikan, Watkins, and Witmer 1996, p. 192), but this paper:

Uses the WWW to distribute the material, and

Includes a computer program written in XLISP-STAT for extensive simulation of the process.

The workshop can be used as a stand-alone unit in an introductory statistics course or as part of a supplemental laboratory.

2 Rather than presenting the actual content of the workshop here, this paper will review the structure of the workshop, discuss hardware and software requirements for the simulation program, and summarize our experiences with actual class use. The workshop can be viewed by using a World Wide Web (WWW) browser to visit
http://jse.amstat.org/v5n1/schwarz.supp/index.html

2. Workshop Structure

3 The material has been organized into a hyperlinked structure starting with an overview page that outlines the objectives of the workshop, the statistical pre-requisites for students, the equipment needed, and finally the WWW-links to the four modules of the workshop. These cover background material, the development of the point estimator, a description of how to perform an experiment using a sampling bowl, and a computer simulation of the process in order to investigate the performance of the estimator in greater depth. Questions are posed at the end of each section that could form part of a written assignment.

2.1 Background Material

4 A common complaint of students is that they cannot see the relevance of the material learned in class to the real world. In the last few years, Canada has seen the collapse of the Eastern cod fishery, and many problems with dwindling stocks of salmon on the west coast. This has been extensively reported in the news, and few students are unaware of the problems with the fishery. The background section includes a brief overview of the life cycle of the sockeye salmon, and a description of how the Department of Fisheries and Oceans (Canada) uses capture-recapture methods to estimate the number of salmon that return to spawn.

2.2 Point Estimator

5 This section of the workshop illustrates how the estimator is derived and outlines the assumptions made. A sample computation is provided. It also tries to justify to the student why the estimator is sensible using consistency arguments.

2.3 Sampling-Bowl Experiment

6 This section of the workshop guides the student in using a sampling bowl and two colors of beads to simulate the experiment with a known population. The students are asked to perform multiple simulations and examine the results using dot plots or histograms and by computing the mean and standard deviation over the simulations.

7 A sampling bowl and beads can be purchased from a scientific supply company, or alternatively, can be easily constructed from readily available materials as outlined on the WWW pages for this module. Schaeffer et al. (1996) use two different types of goldfish crackers in place of colored beads. This could be suitable for one-time demonstrations, but would not be robust enough to be used for many years. As well, a sampling paddle will ensure that the student will obtain equal-sized samples for each replicate of the experiment rather than just grabbing a handful of crackers.

2.4 Computer Simulation

8 The usefulness of a sampling bowl to illustrate the concept of a sampling distribution is limited by the size of the problem that can be simulated, equipment needs for a large class, and the length of time needed to perform even a moderate number of replications. Furthermore, graphs and summary statistics of the results based on only a few simulations are highly unstable. It is difficult to study the performance in any great detail. However, the role of the sampling bowl should not be underestimated -- as discussed later, it is very helpful to students in understanding the simulation results.

9 For this reason a simulation program was written to remove the tedium from the sampling bowl demonstration. The program is written in XLISP-STAT (Tierney 1990) and is downloaded over the WWW to run on the student's machine. The advantages of using XLISP-STAT over conventional languages are well known and include:

No cost for the interpreter.

Platform independence of the programs. Versions of the interpreter are available for most major platforms and the same XLISP-STAT program will give identical results on all platforms.

High level dynamic graphs are available.

High level user interfaces are available.

10 After downloading the simulation program, the WWW-browser will start XLISP-STAT and start the simulation program. The simulation program is virtually self-contained and does not require any prior experience with XLISP-STAT. When the program has started, the student is presented with a screen that will look similar to Figure 1.

Figure 1 (6.5K gif)

Figure 1. The Initial Startup Screen When the Simulation Program Has Been Loaded.

11 To start a simulation, the students selects the SIMULATIONS menu item and enters the true population size (N), the sample sizes at time 1 and 2, and the number of simulations to perform (Figure 2).

Figure 2 (3.5K gif)

Figure 2. An Example of the Input Screen Used to Set the Simulation Parameters.

12 After the simulations are complete, a window is drawn on the screen containing a histogram of the estimates, a fitted normal curve, and summary statistics on the mean and standard deviation of the estimator over the simulations (Figure 3).

Figure 3 (6.0K gif)

Figure 3. A Sample Histogram Created by the Simulation.

13 The results of the simulation are left on the screen until dismissed by the student. Several different simulations may be performed varying the parameters, and the results from each set will be displayed in separate windows on the screen (Figure 4). The student may resize and move the individual windows. Furthermore, the student can force all windows to be the same size and all histograms to have the same scales by selecting an option from the SIMULATIONS pull down menu. This makes comparisons of the results under different scenarios easy.

Figure 4 (7.9K gif)

Figure 4. An Example of the Results of Two Simulations Presented Simultaneously. The student is able to resize and rescale each histogram individually, or is able to force all histograms to have the same size and scale.

14 A help screen is available that explains how to use the program (refer to Figure 1).

3. Hardware and Software Requirements

3.1 Software Requirements

15 All platforms (UNIX, Macintosh, and non-Macintosh) will require XLISP-STAT and Netscape or equivalent. The bulk of the software testing has been done on Macintosh machines with only occasional testing on non-Macintosh machines. We have tested the simulation program with:

XLISP-STAT 3.44. There have been reports that it will not run on earlier versions, but this should not be a problem as XLISP-STAT is freely available for all platforms and can be upgraded at any time.

Netscape 2.02 and Netscape 3.0. Some students have reported a minor printing problem where certain pages containing inline images will not print completely (typically the last five lines on a page are not printed). If that page only is selectively reprinted, it prints completely.

3.2 Hardware Requirements

16 Because of widely differing hardware and operating system options found on UNIX workstations, we are unable to outline any hardware requirements with any degree of certainty. However, most workstations usually have large amounts of RAM so we do not expect a problem in running the WWW-browser and XLISP-STAT simultaneously.

17 Our student computer lab is equipped with Macintosh PowerPCs running System 7.5 with 16 Mb RAM. We have had no reports of problems from students running the simulation program on these machines. The authors have also tested the program on a Macintosh 040 machine with 16 Mb RAM and also found no problems.

18 Some students have smaller Macintosh machines at home, and they (along with a referee of this paper) have experienced problems when they only have 8 Mb RAM. It appears that the system software requires over 3 Mb RAM, Netscape another 4 Mb RAM, leaving insufficient memory to also run XLISP-STAT. In these cases we recommend that the students print the pages describing the simulation module, save the simulation program to their local disk using the browser, quit the WWW browser, and then launch XLISP-STAT alone and run the simulation program.

19 We have had only limited experience with non-Macintosh hardware. Some students have tried to use the software on their non-Macintosh machines at home. The most common problem reported is the 8 Mb RAM is again insufficient to run Windows, Netscape, and XLISP-STAT simultaneously. We again recommend that they proceed as outlined above for Macintosh machines with small amounts of RAM.

4. Class Experiences

20 We have used this set of instructional materials in two settings:

As part of a short "Women-in-Math" promotional talk for high school students. This consisted of about an hour presentation to a small group (about 20) of female high school students who were interested in mathematics. This group worked together using the bead bowl and simulation program. The same type of presentation has also been given to high school mathematics teachers at their fall conference.

As part of a typical introductory statistics class at our university. There are large classes of about 200 students from the biological sciences. For this class, the sampling bowl was used as part of a classroom demonstration, followed by the computer demonstration to the class, and lastly followed up by a few short questions for the next assignment where students had to do two simulations using different sets of simulation parameters and comment on the results.

21 We found that the sampling bowl demonstration is very necessary to remove the aura of "black magic" from the simulation results. In other computer-aided exercises in our classes, we have found that students tend to accept the results uncritically, feeling perhaps that if the computer has produced the answers, the answers must be correct, regardless of any input errors they may have made.

22 In reviewing our regular "three-minute assessments" where students are asked what were the important points of the last week or what was confusing about the last week, students often pointed out that the sampling bowl was important for them to see that estimates will vary. They were also interested in seeing how this simple method was used in estimating salmon spawning populations. Reviews of the computer simulation were mixed. Some students felt that the computer simulation added little to their understanding and was just "busy work." Other students commented that actually being able to compare different scenarios was interesting but they were confused by the fact that the role of "sample size" in controlling precision is not well defined in this problem. (We suspect that this is related to the concepts presented in class that sample size controls precision and randomization controls representativeness.) Other comments indicated that they had a better appreciation for the role of a statistician in these types in experiments where the allocation of effort (between tagging and recovery) is often a concern.

5. Summary

23 There are many advantages to using the WWW to deliver course material to students compared to conventional printed materials:

It can deliver a mixture of media (text, graphics, programs).

Instructors can customize material easily.

Workshops can be updated quickly and easily, and updates are immediately available.

It is suitable for distance education courses.

It has a "just-in-time" delivery that avoids problems of ordering and storing printed materials.

It can be enriched with hyperlinks to supplemental material.

24 We believe that these advantages outweigh the concerns that a computer must be available to access the material and that some students are uncomfortable with using a computer.

25 The concept of a sampling distribution is the one of the most difficult concepts for students to grasp in an introductory course. The distribution of actual data points is easy to see; the distribution of a hypothetical population of individual units is easy to visualize; the distribution of an estimate over repeated hypothetical samples is one extra level of abstraction and not something that most students would have experienced. The sampling bowl shows them direct evidence that the estimate will vary from sample-to-sample and provides a conceptual framework to understand the simulation results. The computer simulation helps them explore the effects of sample sizes on the resulting distribution and gives a feel for the types of decisions (should effort be allocated to tagging or recovering fish) that statisticians are called on for assistance.

Acknowledgments

This research was partially funded by the Natural Science and Engineering Research Council (NSERC) of Canada. The authors would like to thank the referees for their helpful comments and suggestions.

References

Tierney, L. (1990), LISP-STAT, New York: John Wiley.

Scheaffer, R. L., Gnanadesikan, M., Watkins, A., and Witmer, J. A. (1996), Activity-Based Statistics: Instructor Resources, New York: Springer.

Carl James Schwarz (author for contact)
Department of Mathematics and Statistics
Simon Fraser University
Burnaby, BC, Canada V9A 1S6

cschwarz@cs.sfu.ca

Jason Sutherland
Department of Mathematics and Statistics
Simon Fraser University
Burnaby, BC, Canada V9A 1S6

Return to Table of Contents | Return to the JSE Home Page