Journal of Statistics Education, V8N1: Albert Project Assignment

Sample Survey Project

Introduction

In this project, you will perform your own statistical inference using methods described in this course. Specifically, you will take a sample of students from your school to learn about one or more proportions of interest. After you take your sample, you will use inferential methods to see how the observations have modified your beliefs about each proportion. Here I outline the different steps of the project and discuss constructing a prior, selecting a random sample, summarizing the sample information, and computing the posterior probability distribution.

Getting Started

Begin by thinking of one proportion of your student body that you wish to learn about. This proportion might be the fraction of students in agreement with a particular issue, the fraction who prefer one flavor of ice cream to another, or the fraction who participate in a special activity. Suppose, for the sake of illustration, that you are interested in the proportion of the student body who regard their political philosophy as conservative. Then p would denote the proportion of conservative students in the entire student body.

Once you decide on a proportion of interest, construct a question that will be asked of each student in your sample. If, for example, you are interested in the proportion of students who think of themselves as conservative, you might ask the question

Do you think your political philosophy is conservative?

The possible responses to this question would be "yes," "no," or "I don't know," and you learn about p by counting the number of yes's in your sample.

Constructing a Prior

Before a sample is taken, you likely have some opinions about the location of the proportion value p. Your opinions about this proportion are represented by means of a prior probability distribution. For simplicity, suppose that p can be one of the eleven values 0, .1, .2, .3, .4, .5, .6, .7, .8, .9, 1. By following the worksheet, you can construct a prior distribution that approximately reflects your knowledge about p.

Taking a Random Sample

In Topic 10, we discussed how to take a Simple Random Sample (SRS), which is one type of random sample. However, the procedure for taking a SRS (labeling each member of the population and using the random digit table) is impractical when the size of the population is large. This will be the case if you have many students at your school.

A Systematic Random Sample is another type of random sample that is easier to take when you have a listing of the population, such as a phone book containing the names and phone numbers of all students at your school.

Here is how you take a Systematic Random Sample:

Decide on a step size. Here we'll use 50, but other values can be used.
Decide on a starting place to sample in the population listing. (This should be done in some random fashion.) Say we decide to start on the 17th listing on page 50 of the phone book.
Add the step size (50) to each listing to get the next one to sample. So we'll sample the 17th, 67th, 117th, 167th, 217th, ... listings.
Continue until you've got a large enough sample.

What if a person is not home when you call? I would just forget this person and keep sampling. However, this procedure might introduce a bias in your sampling procedure. Why?

Data Analysis

Suppose that you have taken your random sample, and you have a list of responses of the type "yes" or "no" that are the responses to your question. You can summarize these data using the basic techniques described in Topic 1. A count table is helpful for finding the number and proportion of yes's and no's, and a bar graph can be used to display these data.

Statistical Inference

After the prior distribution for the proportion has been constructed and the data are taken, we use the methodology of Topic 16 to compute the posterior probability distribution. There is a Javascript program called http://jse.amstat.org/secure/v8n1/p_discrete.html that can be used in this computation. You enter the prior probabilities into one column of the spreadsheet and input the number of yes's and no's, and the program computes the posterior distribution.

The Project Report

Write a report that describes all parts of this scientific study. This report should be divided into three stages.

The first stage of the report describes the choice of a survey question and the construction of the prior probability distribution. How did you decide on your particular inference problem? Is there any personal experience or things you heard or things you read in the newspaper that motivated you to choose your question? Before you took your sample, what did you think you would find out? Include the prior probability worksheet that you used to construct your prior distribution.

The second stage of the report describes the "data phase" of the investigation. Describe in detail how you took your random sample. Describe the method you used, how many students were contacted, and any difficulties you experienced in collecting these data.

Give the results of your survey, including the number who said yes, no, or something else. Include any graphs that you made to summarize these data. Include the raw data for the individual students in the report.

The third stage of the report describes the statistical inference. Write down the posterior probability distribution and graph the probabilities. Construct a probability interval for the proportion of interest; describe the methodology you are using and give all the details of your computation.

It is important to explain, using language that a layman would understand, what your interval estimate means. In particular, if you are computing a 95% interval, explain what 95% means. Interpret this interval in the context of your example.

To conclude this part of the report, you should explain what you learned from this project. Were you surprised by the results? How different were the prior and posterior distributions? What problems did you experience in doing the project? If you had to do the project over again, what would you do differently?