Estimating the number of fish that return to spawn using capture-recapture methods.

Manual Simulation using a sampling bowl

After obtaining a point estimate for the number of salmon returning to spawn, DFO would also like to know how precise and how accurate is the estimate.

Questions about accuracy and precision can only be answered by examining the sampling distribution of the point estimator. In real life only a single sample is drawn from the population (e.g., DFO only does a single capture-recapture experiment), but in order to understand the performance of an estimator, statisticians must look at how the estimate performs when the experiment is repeated many times. This is often done using statistical theory (which is beyond the scope of this course), but a similar process can be done using a simulation that mimics the real life experiment.

Get the sampling bowl, paddles, and beads, and follow these directions for the experiment.

Notice that the estimates vary over the different trials. Why? Carefully review what happened. The population size (the number of green beads initially in the bowl) is fixed and the same in all trials. The number that were tagged (the number of white beads) is also fixed and so was the number of carcasses searched (the number of beads in the paddle). The only random variable in the experiment was the number of tagged fish spotted in the carcass sample (the number of white beads found on the sampling paddle). This number varies because the paddle only selects a sample from the population of carcasses. Even though the proportion of all carcasses with tags is fixed, the sample proportion varies from sample to sample. Finally, because the estimates of escapement were computed using this random variable (it appears in the denominator of the equation), the estimated escapement will also vary from sample to sample.

It is important to step back and look carefully at what you did. The sampling procedure was repeated with everything else held fixed. In real life situations, you would not repeat an experiment many times - what you are doing if examining the theoretical performance of the estimator to see what happens in the long run.

Draw a dot-plot or a histogram of the estimates from your 30 trials. This graph represents part of the sampling distribution of the estimator and can be used to examine the performance of the estimator. [The true sampling distribution would require us to look at all possible samples rather than just 30 trials.]

The term accuracy refers to the long-run average performance of an estimator over all possible samples while the term precision refers to the variation of the estimator over all possible samples. The histogram or dot-plot drawn above shows some of the features.

Because accuracy refers to the long-run average performance, compute the sample mean of the thirty estimates. Because the actual number of beads in the bowl was 4,000, what can you say about the accuracy of the estimator, i.e., does the estimator appear to be 'unbiased' for the true population value?

Similarly, because precision refers to the variability over all possible samples, we need to assess this variability. Compute the standard deviation of the thirty estimates. The technical term for the standard deviation of an estimator is the standard error (s.e.). The empirical rule indicates that about 95% of observations should be within two standard deviations of the mean. Now the standard error is really a standard deviation of the estimates over repeated experiments so we expect that the empiricial rule will also hold here. Compute the average estimate ±2 standard errors using the standard error computed above. Does this interval contain about 95% of the values?

Of course, DFO would like to have some confidence that their estimate is close to the true number of spawning fish, in this case 4,000 fish. The empirical rule again tells us that roughly 95% of the estimates will be within ±2 standard errors of the true value. Examine your data sheet and see what fraction of your estimates are within ±2 standard errors of 4,000.

In general, estimators that are unbiased and have small standard errors are preferred. If an estimator is unbiased, it means that the average value of the estimator (when averaged over repeated experiments) will equal the true population value. If an estimator has a small standard error, it means that one can be fairly certain that the estimate will be close to the true value in the population.

The next module will examine what characteristics of the sampling procedure control these aspects.

Questions

  1. What is the true population size in this experiment? [This is usually unknown in real life.]

  2. Did every experiment estimate the true population size without error? Can you predict if a particular experiment will over or under predict the true value?

  3. Does this method, on average, predict the correct population size? What does it mean, "on average"?

  4. Explain why the value of 261,000 (found in the section on point estimate) is not an exact figure for the total number of spawning Sockeye Salmon that returned to the Chilko river.

  5. Are there potentital sources of bias? For example, what do your think would happen if the assumptions were violated?

[PREV Module][NEXT Module][OVERVIEW Module]
[Background][Point estimate] [Sampling Bowl][Computer simulation]