# A Series of Tutorials for Teaching Statistical Concepts in an Introductory Course I. Sampling From an Aerial Photograph

Glenys Bishop

Journal of Statistics Education v.6, n.2 (1998)

Copyright (c) 1998 by Glenys Bishop, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.

Key Words: Excel; Natural resources science; Proportions; Sampling distribution; Spatial statistics.

## Abstract

This paper outlines one of a series of tutorials developed as part of an introductory statistics course for Agricultural and Natural Resource Sciences students. Here we compare two methods of sampling from an aerial photograph to obtain an estimate of the proportion of a particular type of vegetation. One method, transect sampling, is traditionally used by field ecologists, while the other is simple random sampling in a plane. Preparation details and possible extensions to the tutorial are described.

# 1. Introduction

1 Faced with teaching a compulsory introductory statistics subject to two large groups of Agricultural and Natural Resource Sciences students, we searched for motivating examples to be used in both lectures and tutorials. This is the first of several papers describing the tutorials we have developed. Our target audience consisted of first year Agricultural and Natural Resource Sciences students, but some of the examples are more widely applicable.

2 Students participate in eleven tutorials throughout the semester. In one tutorial we use the fruitfly data of Hanley and Shapiro (1994) to teach hypothesis testing, while in another we use the conditional probability examples of Rossman and Short (1995). These examples have been well received by the students.

3 In this and subsequent papers, we shall describe some of the tutorials that we have developed. The tutorials are conducted in computer laboratories, but access to a computer is not essential for the exercise discussed here.

4 We use Microsoft Excel for calculations, graphics, and statistical functions. However, any statistical package with a random number generator, or just a table of random digits, will suffice for this exercise.

# 2. Teaching Goals

5 The primary goal of this tutorial is to enable students to compare transect sampling, traditionally used by ecologists, with simple random sampling. We also want to prepare students for the idea of the sampling distribution of a proportion by recording the values obtained by all the students.

6 A more detailed objective of this tutorial is to teach students how to select a sample -- in this case, a sample of points from an aerial photograph. To do this, they must understand the concept of randomness and how to use either a random number generator or a table of random digits. In this tutorial we use random number generation from the Excel analysis tools.

# 3. Background

7 Rangeland managers and field ecologists often need to estimate the area covered by a particular vegetation type within a region. Although remote sensing by satellite imaging has made this task easier, it is often necessary to employ simpler methods. An aerial photograph provided by courtesy of the Resource Information Group, Department of Environment and Natural Resources, South Australia, illustrates the types of information that may be sought.

8 The photograph in Figure 1 shows the Sandy Creek Conservation Park and surrounding areas about 50 kilometres north of Adelaide. The conservation park is in the bottom left quadrat of the picture. The surrounding area includes roads, a river, cleared and uncleared farmland, plantations, dams, and a few houses. The whole region is fairly flat.

Figure 1 (540K jpg)

Figure 1. Aerial Photograph of the Sandy Creek Conservation Park.

9 Transect sampling when non-moving objects are to be counted involves choosing a line or series of lines along which the counts are to take place. The transects may be chosen randomly, or the first may be chosen randomly and subsequent transects systematically. They may be parallel, perpendicular, or at some other angle suitable for the situation.

10 An alternative method is point sampling. Imagine a grid placed over the area such that the grid lines are as close together as is practical. For instance, they may be a metre or half metre apart in a conservation park. Coordinate pairs are randomly generated, and the points represented by those pairs are examined for the presence or absence of the objects of interest.

# 4. Logistics

11 To assist others in running this tutorial we have prepared details about the materials required and also guidelines for tutors. They should be read in conjunction with the student notes shown in the Appendix at the end of this paper.

## 4.1. Materials

12 Students should be able to easily distinguish undisturbed bushland in the photograph. We have experimented with photocopies and found the photographic button on a photocopying machine takes a clear copy of a photograph. Copies of a photocopy are not clear enough to be useful. In each tutorial class, we have two or three original photographs, laminated for protection, so that students can view the fine details.

13 So that students can find randomly generated points, we have glued a measurement scale to each edge of the photograph before copying it. We obtained these scales by photocopying a ruler marked in millimetres onto white paper.

## 4.2. Guidelines for Tutors

14 In the week before the tutorial, students should be advised to familiarise themselves with the problem as described in the first part of the student notes (see the Appendix). They should also be told to bring a ruler and a highlighter pen to the class.

15 The tutorial is designed to last for 50 minutes, and it is important to keep things moving as the main aim of the exercise can only be met when all students have collected two different samples and calculated estimates.

16 The tutorial can be divided into three parts: definition, execution, and conclusions. First, divide the students into discussion groups of about four to define the regions they will regard as undisturbed bushland. Allow five minutes for this discussion, and then hold a five-minute forum to establish a class definition of undisturbed bushland. This definition must be operational; that is, students must be able to decide whether any point on the graph is undisturbed bushland or not.

17 Two points should emerge from the forum. To compare sampling methods, the same definition of bushland must be used for both methods. Furthermore, if students' estimates are to be directly comparable, they must all be using the same definition.

18 The tutorial now moves into the execution stage. Ensure that students clearly highlight the undisturbed bushland areas on their photocopies of the photograph. (In the past, we have found that the means of all students' estimates for the two methods differ substantially. This is probably because many students classify isolated clumps of trees as bushland in simple random sampling, but not in transect sampling.) Warn students that highlighting and sample selection should take no more than 20 minutes.

19 The last stage of the tutorial involves discussion about what can be learnt from the data and reaching conclusions. Collect each student's estimates on the board so that everyone can see them. The estimates should be arranged in two columns: transect and simple random sample. Get students to discuss the precision and usefulness of the two methods. Discussion points can include

• A comparison of five number summaries for the two methods to see which estimate is less variable or more precise,

• An examination of the fairness of the above comparison, taking into consideration the amount of effort required to obtain each estimate in class and how this might change in the field,

• The usefulness of each method in terms of precision -- that is, the range of class estimates should be small enough to give us a reasonable idea of the proportion of native bushland, and

• Consideration of improving precision by increasing the number of points or transects sampled.

# 5. Extensions

20 If more time is available there are several issues that can be developed from this tutorial. The most obvious is the sample size effect for the simple random sample. We have used 20 points because that number was thought to be achievable in the allocated time. As a short cut, you could ask students to use the first 10 points to obtain an estimate, and then all 20 points.

21 More advanced students could be asked to examine other sampling schemes such as systematic sampling. In this method, r parallel transects at equal intervals are examined; the first transect is selected randomly in the range 1 to (180/r). For this example, if ten transects are to be used, the first is chosen by generating a random number between 1 and 18 (180/10). Subsequent transects are 18 mm below the previous one.

22 Points can also be chosen systematically instead of randomly. They can be selected at regular or irregular intervals along parallel transects. The aim is to give representative coverage of the area, while avoiding following features of the landscape such as streams, fencelines, and ridges. Methods that simulate what an ecologist might do on foot in a park, when no aerial photograph exists, could be discussed. Buckland, Anderson, Burnham, and Laake (1993) discuss some of the practicalities of these methods when introducing principles of distance sampling.

23 Transect sampling is also used in microscopic work. For example, physiologists count certain cell types, and engineers examine grain structure in metals. Commonly, a grid available on the microscope is used to take systematic samples. Photographs taken under the microscope could be used in place of the aerial photograph or as an extension to illustrate other uses of transect sampling.

24 In subsequent lectures, when introducing the sampling distribution of a proportion, we have found it useful to refer to the variety of estimates obtained by students. The number of points, from a simple random sample of 20, that are in undisturbed bushland may be used as an example of a binomial variable. However, because points on the transect are not independent of one another, the number of points out of 400 in undisturbed bushland is not a binomial variable.

25 As an assignment question, we ask the students to calculate a 95% confidence interval for the proportion, p, of the whole area that is in undisturbed bushland using their own simple random sample estimate. We also ask them to explain why the formula for the 95% confidence interval is inappropriate for the transect method. Most of them can see that the points are not independent.

## Acknowledgments

This work was conducted with aid of a University of Adelaide Teaching Development Grant. I wish to thank two anonymous referees for their very helpful suggestions.

# Appendix: Student Notes

## A.1. Learning Objectives

In this tutorial we shall learn
• Some ways of comparing different sampling methods,

• That estimates of the same thing, based on different samples, will vary, and

• How to select a random sample using random numbers.

## A.2. The Problem

A common problem in rangeland management is the estimation of areas covered by a particular vegetation type in a region. In this workshop, we are going to compare two different rangeland surveying techniques, both of which involve sampling.

### Method 1: Line Transect Method

This method is suitable for flat, low-lying scrub with clear delineations among vegetation types. Walk in a straight line from a random starting point and either count paces or use a tape measure to find the proportion of the whole traverse that intersects the vegetation type of interest.

Since we have a good view of the region, we may decide to choose a representative or best path to take for our estimate, but this leaves us open to the possibility of biasing our estimate towards a preconceived idea of the area. On the other hand, choosing a line at random is only the best method when the vegetation type of interest occurs in random clumps, clearly a rare property in practice.

A common compromise is to take two lines, one at right angles to the other. In this way we are likely to pick up any clumping.

### Method 2: Grid Estimation Method

Given the dimensions of the region, randomly choose a subset of point coordinates to sample. Walk to each point and decide whether it lies in the vegetation type of interest. Find the number of points in the vegetation as a proportion of all points examined. In each case, this proportion is an estimate of the proportion of the total area of the region covered. We can do something similar in the laboratory by using an aerial photograph.

## A.3. Initial Discussion

You have been given a photograph (scale 1:16000) of the Sandy Creek Conservation Park and surrounds near Gawler taken in March 1979. The aim is to determine how much of the whole region is "undisturbed bushland" as this has implications for the park fauna.

1. In discussion groups, decide which of the following you will regard as undisturbed bushland. The original photograph is with the tutor for clearer inspection.

• River banks
• Paddocks with substantial tree cover
• Plantations

2. Still in groups, discuss reasons why it is important to define the undisturbed bushland regions before you start.

3. Share your ideas with the whole class.

4. Once the class has settled on a definition, use a highlighter pen to mark boundaries around all areas of undisturbed bushland. The highlighter should be at the perimeter of the bushland but not in the bushland.

Eyeball Estimate: Just looking at the photograph, what proportion of the region do you think is undisturbed bushland? Enter this value on the tutor's data sheet.

## A.4. Executing Method 1

1. The left-hand side of the map is 180 mm long. To choose a starting point, select a random number between 0 and 180.

2. Write down the selected random starting point (in millimetres) for the left-hand side.

3. Draw a horizontal line across the picture starting at this point. Write down the length of the line lying in undisturbed bushland -- that is, within the boundaries you drew previously.

4. Repeat Steps 1 through 3 for a random starting point along the top of the picture. (N.B. This time you want a starting point between 0 and 220.)

5. Add the two lengths of undisturbed bushland together.

6. The length of the horizontal line is 220 mm, and the length of the vertical line is 180 mm. Use your answer from Step 5 to estimate the proportion of the total area in the photograph that is undisturbed bushland.

7. Write your answer from Step 6 on the tutor's data sheet.

## A.5. Executing Method 2

Imagine a grid of one-millimetre squares drawn on the photograph. Each of the intersections is a point either lying in undisturbed bushland or not. Since this grid is very fine, the proportion of the points which lie in undisturbed bushland will be close to the corresponding proportion of the area. We will sample 20 points at random by selecting random coordinates from the top and side scales 20 times.

1. Set up a table for your sampled points under the headings Top, Side, and Bush.

2. Generate 20 top coordinates by selecting 20 random numbers between 0 and 220. Enter these numbers in the Top column of your table.

3. Next to these, in the Side column of your table, enter 20 random numbers in the range 0 to 180.

4. Find the point on the photograph given by the Top and Side coordinates in the first row of your table. If the point is in undisturbed bushland, as marked by your highlighted boundaries, enter a 1 under the Bush heading. Enter a 0 otherwise.

Suppose your first coordinate pair is Top = 25 mm, Side = 150 mm, and the point lies in farmland. Then your table would look like this:

```    Top     Side    Bush
25      150      0
```

5. Repeat Step 4 for all 20 points.

6. Count the number of 1's in the Bush column and divide the count by 20 to estimate the proportion of undisturbed bushland in the region.

## A.6. Summary Discussion

Your tutor will write the class results for the two methods on the board. Get into groups and perform the following tasks.

• Find the five number summary of estimates for Method 1 in your class.

• Find the five number summary of estimates for Method 2 in your class.

• Use the five number summaries to decide which sampling method is more precise (i.e., less variable). Give reasons. Consider ways of improving the precision.

• Decide whether either method is precise enough to be useful.

• Discuss whether it is reasonable to compare a proportion estimated from a random sample of 20 points with one estimated from two lines with a total of 400 points. Give reasons. (Hint: You could consider the amount of effort required to collect each of these samples in the laboratory and in the field.)

# References

Buckland, S. T., Anderson, D. R., Burnham, K. P., and Laake, J. L. (1993), Distance Sampling: Estimating Abundance of Biological Populations, London: Chapman & Hall.

Hanley, J. A., and Shapiro, S. H. (1994), "Sexual Activity and the Lifespan of Male Fruitflies: A Dataset That Gets Attention," Journal of Statistics Education [Online], 2(1). (http://jse.amstat.org/v2n1/datasets.hanley.html)

Rossman, A. J., and Short, T. H. (1995), "Conditional Probability and Educational Reform: Are They Compatible?," Journal of Statistics Education [Online], 3(2). (http://jse.amstat.org/v3n2/rossman.html)

Glenys Bishop
Statistics Department