Protocols and Pilot Studies: Taking Data Collection Projects Seriously

Thomas H. Short and Joseph G. Pigeon
Villanova University

Journal of Statistics Education v.6, n.1 (1998)

Copyright (c) 1998 by Thomas H. Short and Joseph G. Pigeon, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.

Key Words: Assessment; Clinical study; Planning; Rubric.


Although there is consensus among statistics educators that student data collection projects are of substantial value, we feel that the planning and piloting phases of data collection are often neglected. We ask our students to write protocols or detailed plans for how the data will be collected, and to plan and conduct pilot studies before embarking on full scale data collections. We present examples and results from situations including college freshman introductory statistics courses, graduate statistics courses, and teacher training workshops.

1. Background

1 This decade has seen an increase in the number of recommendations that students in statistics courses should carry out data collection projects and in the number of statistics educators with advice about how to implement such projects. Much of the momentum to incorporate projects can be attributed to challenges such as that issued by Hogg (1991), who urged educators to provide projects that give students experience in "asking questions, defining problems, formulating hypotheses and operational definitions, designing experiments and surveys, collecting data and dealing with measurement error, summarizing data, analyzing data, communicating findings, and planning `follow-up' experiments suggested by the findings."

2 In this article we propose contexts that address the "asking questions, defining problems, and planning `follow-up' experiments" phases of a data collection project. The specific assignments we implement are protocols, which are detailed plans for data collections, and pilot studies, which are small-scale rehearsals of larger data collections.

3 A number of articles about projects and guidelines for managing student projects have appeared in the recent statistics education literature. The introductory statistics course projects proposed by Halvorsen and Moore (1991) include a proposal stage. Students are asked to write a detailed plan for a research question of their choice. The instructor evaluates the plan before the students are allowed to continue with the project. We propose specific protocol activities that capture the same "propose, revise, proceed" flavor described by Halvorsen and Moore.

4 Ledolter (1995) describes an involved semester long project for students in the second semester of a "Statistical Methods" course. His project assignment is "relatively unstructured," allowing the students to specify a problem, then collect and analyze the data. Although Ledolter's discussion of projects is valuable, and the accompanying list of projects his students have attempted is interesting, he does not emphasize the importance of the planning stages of data collection.

5 The project assignment described by Fillebrown (1994) follows a structure that we encounter frequently when discussing projects with other statistical educators. Students are given a series of milestones to pass throughout a semester. Some of the milestones involve topic selection and elementary planning. The instructor imposes constraints on the data to be collected such as a mix of quantitative and qualitative information and a minimum number of variables and observations.

6 Mackisack (1994) discusses projects that emphasize experimental design, an approach that encourages careful planning and preparation before the data are actually collected. Students are required to document the details of the design of an experiment of their choosing.

7 Student project assignments have even found their way into introductory statistics textbooks. The text by Wardrop (1995) is notable because it emphasizes the language and conceptual structure of experimental design, and it includes exercises that are based on the results of student projects. The protocol activities we present would be a useful supplement at the beginning of a course taught out of a text such as Wardrop's, because they encourage attention to specific details about how the design concepts are to be implemented in a student project.

8 In an NCTM Addenda Series volume on Data Analysis, Burrill et al. (1992) distinguish between the objectives of long-term and short-term projects. Especially in first statistics courses and workshops, it is unrealistic to expect students to complete a "long-term" project of a high quality.

9 Some of the articles we mention highlight the data collection steps in a statistical analysis. We wish to shift the focus even farther from the analysis and interpretations of statistical results, and concentrate exclusively on the planning of good data collections. Students and teachers often underestimate the complexity of planning data collections. The examples we provide are of graduated complexity and encourage the development of planning and general problem-solving skills in our students.

2. Protocols

10 One of the major planning tools used by the pharmaceutical industry is the clinical protocol. A clinical protocol is a written set of explicit instructions for carrying out a clinical study. It clearly identifies the objectives, subjects, statistical techniques to be used, and the expected results of a clinical study. An outline of the components that typically appear in a clinical protocol is given below. More information about clinical protocols can be obtained from clinical trials textbooks such as Pocock (1983) and Wooding (1994).

Components of a Typical Clinical Protocol

  1. Title

  2. Summary

  3. Background and Rationale

  4. Study Objectives

  5. Patient/Subject Definition

  6. Study Design/Treatment Definition

  7. Concurrent Treatments

  8. Clinical and Laboratory Measurements

  9. Data Analysis

  10. Adverse Experiences

  11. Informed Consent

  12. Publications

  13. Investigator Signatures

11 Having students consider these components at the beginning of a project assignment is an important and often overlooked part of a data collection experience. Writing a protocol provides students with an introduction to practical difficulties commonly encountered in the beginning stages of a new investigation. We have used three variations of protocol assignments in the statistics courses we teach, and these are presented here in order of complexity, from the most straightforward to the most open-ended.

12 The first and most straightforward protocol assignment gives students a specific objective to consider. The U.S. News and World Report Guide to America's Best Colleges reports the estimated cost for books at Villanova. The students are asked to design a study that will provide such an estimate. This is a seemingly simple assignment, but it forces students to realize that planning any study is not trivial. Before any data are collected, careful consideration must be given to issues such as population definition, random and representative sampling, logistics of data collection, choice of statistics (mean vs. median, for example), and sample size.

13 Many of the students choose to design surveys, and some plan observational studies to be carried out at the bookstore cash registers. One student suggested asking the bookstore how much was spent on textbooks last semester, asking the registrar how many students were enrolled at Villanova last semester, and simply dividing the two values. Compared to the complicated surveys proposed by other students this suggestion seemed rather elegant, although it remains open to criticism.

14 Students work on the first protocol assignment individually, and approaches are shared during class discussion. It is important for students to see the variation in approaches adopted by their classmates.

15 The second variation on the protocol assignment is to provide the students with data that have already been collected, and then ask them to work backwards to create a possible protocol. Certainly the students can be creative as they write their instructions, but the existence of the dataset provides some structure to be used as a common basis.

16 In a graduate level Clinical Trials course, we provide data (Table 1) from a bioequivalence study described in one of the "favorite datasets" supplied by Bradstreet (1991). In this example, a drug consisting of two active ingredients, or components, is manufactured by two processes. The two processes could represent products from two competitors or two formulations within one company. Fourteen male subjects were involved in a two period crossover design, switching either from product A to product B or vice versa. The concentrations of the two components were measured repeatedly, producing curves of concentration against time. The following response variables were recorded for each subject and each component: AUC = Area Under the Concentration vs. Time Curve, CMax = Value of the Maximum Concentration, and TMax = Time of Maximum Concentration.

Table 1: Bioequivalence Study Data

Sub Seq Component A B A B A B
1 A,B 1 5253.05 4781.60 1219.4 1238.4 1.0 0.5

2 752.84 643.59 136.6 128.0 5.0 3.0
2 B,A 1 4612.11 4720.70 859.4 1028.2 1.5 5.0

2 578.51 793.77 94.1 131.9 3.0 5.0
3 B,A 1 4443.50 4949.29 812.0 1483.8 4.0 1.5

2 260.68 297.44 46.3 64.2 4.0 2.5
4 A,B 1 5105.56 5349.76 1453.9 1280.7 1.5 2.0

2 367.62 417.40 69.1 94.3 4.0 4.0
5 B,A 1 6387.73 6828.42 1411.5 1968.1 1.5 1.0

2 954.56 999.94 160.7 182.6 4.0 3.0
6 A,B 1 3443.15 3215.03 854.1 1164.3 1.0 1.0

2 282.61 341.86 66.7 101.7 3.0 1.0
7 A,B 1 4484.19 4097.19 1146.3 1042.5 1.5 2.5

2 991.46 866.44 194.4 169.8 2.5 4.0
8 B,A 1 6358.39 6431.11 1568.1 1332.8 2.5 1.5

2 1266.90 1144.67 227.5 206.6 5.0 2.5
9 B,A 1 4960.48 5119.09 1042.4 1296.9 5.0 2.0

2 550.63 553.32 85.4 95.7 6.0 3.0
10 A,B 1 2803.61 3332.69 639.3 947.1 4.0 0.5

2 323.48 400.28 64.8 78.3 3.0 3.0
11 B,A 1 4612.30 4641.99 966.3 1382.0 0.5 1.0

2 734.18 910.00 149.2 199.1 5.0 4.0
12 A,B 1 4696.52 4592.69 1001.0 1251.2 2.5 1.0

2 706.59 756.07 134.2 135.6 4.0 4.0
13 A,B 1 5254.33 5348.04 1129.5 1381.0 3.0 2.0

2 878.14 1284.34 128.8 203.5 4.0 6.0
14 B,A 1 3419.08 2192.47 745.2 744.5 3.0 1.0

2 429.69 380.38 99.5 80.4 3.0 3.0

17 The research questions of interest are also provided to the students: First, are the manufacturing processes equivalent, and second, should the component active ingredients be evaluated separately or combined? We provide the data on a handout with some smudged characters to simulate a "non-clean" dataset. This encourages interaction between the student (data analyst) and the instructor (study investigator) to effectively "clean" the dataset before the analysis begins. Students are then asked to recreate a protocol that could have yielded the given data.

18 We have been surprised by the variety of legitimate protocols our students have constructed from the same set of data. For example, student protocols have identified study subjects ranging from college students to convicted prisoners and study treatments ranging from over the counter cold medications to AIDS treatments. Class discussion of the different approaches makes the students aware of options one might encounter and decisions that must be made in the planning of a clinical study.

19 The third and most open-ended example of a protocol assignment is to have students write data collection plans to validate common sayings. Students working in groups can come up with varied and interesting approaches to investigate sayings that most find very familiar. This is an entertaining activity for audiences at many different levels of sophistication and has also been used in the pharmaceutical industry to help non-statistical clinicians appreciate the value of careful planning of clinical studies.

20 Here are some of the sayings that we have used in courses and workshops:

21 This activity is best suited for group work because students interacting in pairs or groups experience the advantage of brainstorming with classmates and also learn to negotiate the details of an experiment. Having the groups present their protocols to the class caps off an interesting and enjoyable classroom exercise.

3. Pilot Studies

22 It is our experience that students frequently enjoy long data collection and analysis projects, but that they are sometimes disappointed when they realize that more careful planning could have produced better results. Here is one opinion on the disappointment students encounter. Mary Parker states in Cobb (1992):

I haven't had students plan studies and gather data in a couple of years, for various reasons. I wasn't happy with the actual projects during the four semesters I did it. However, I was quite happy with the students' experiences. Almost every one of them, by the time they finished, was rather sheepish about what a poor study it turned out to be, because they could see all the ways it really should be improved.

23 One way of limiting the disappointment that may be associated with a longer project is to engage the students in a pilot study. A pilot study is a small-scale rehearsal for a larger main study and follows naturally from a written protocol. The purpose of a pilot study is to learn more about the data acquisition process without investing large amounts of time and resources.

24 It is unrealistic to expect students to complete a full scale data analysis project in a one-semester course, but it is reasonable for them to plan a study, carry out a pilot, and then revise the plan for some hypothetical large study even if the main study is never completed. Thus we often require students to work in teams to conduct a pilot study according to a protocol that was written either by themselves or other students, and after completion of the pilot study, to critique and revise the protocol. By going through this process of planning, doing, and revising at least once, students will be engaged in a realistic and meaningful data collection exercise, even if the proposed main study is never completed. Of course, if time permits, they will be further enriched by completing the actual data collection, statistical analyses, and final written report for the main study.

25 A pilot study assignment can be a valuable learning experience even after the preparation of a thoughtful protocol because the pilot study helps to refine the details that may have been overlooked in the protocol. Consider the example of an experiment to measure growth in plants under different types of fertilizer. A well-written protocol may address many important issues such as baseline measurements, blocking, and sample size, but it is possible that the protocol may not include specifics of how growth should be measured. A small pilot study could result in a refined protocol that might clarify details such as at what point on a plant the measurement should be made and whether the plant should be straightened before measuring. Only the experience of applying the directions provided in a protocol can make students aware of the level of detail required to conduct a careful scientific experiment.

4. Protocol Assessment

26 Knowing the characteristics of an appropriate protocol can be difficult for statistical educators who do not have experience consulting during the planning stages of data collection projects. Following the advice on evaluation given by Garfield (1994), we propose a rubric that summarizes the major characteristics we think are important for writing an appropriate plan for a data collection. The rubric is intended to provide guidance for the evaluation of written protocols, and it specifically highlights the statement of objectives, identification of variables and eligible subjects, planned statistical analyses, and contingencies for unusual or unexpected circumstances.

27 In addition to the obvious implementation of a rubric by an instructor evaluating the work of students, it can also be used by students as they critique a protocol that is provided for them. The students can benefit from experiencing the perspective of an evaluator as a complement to their usual perspective of being the producer of statistical results.

Rubric for Evaluating Protocols

Assign 0-3 points for each of the following categories:

  1. States the background and objectives of the study

    0 pt: Fails to state objective

    1 pt: States objective but fails to set it in a context

    2 pt: Objective is ambiguous or context is vague

    3 pt: Clearly states objective within a context

  2. Identifies subjects and variables

    0 pt: No description of subjects or variables

    1 pt: Missing description of subjects or variables

    2 pt: Ambiguous variable definitions or vague eligibility

    3 pt: Eligibility and variables clearly described

  3. Describes statistical analysis

    0 pt: No consideration of statistical analysis

    1 pt: Suggests displays that don't match structure of the data

    2 pt: Appropriate displays, unclear plans for inference

    3 pt: Displays and inferential tools are appropriate and clear

  4. Describes contingencies

    0 pt: No consideration for consent or unusual circumstances

    1 pt: Superficial mention of possible contingencies

    2 pt: Consideration of consent and unusual circumstances

    3 pt: Detailed plans for consent and unusual circumstances

5. Concluding Remarks

28 We have used protocol and pilot study assignments for several different audiences including freshman liberal arts students, nursing students, K-12 teachers, and graduate students in applied statistics. Variations of context and focus seem to make the basic activities interesting and informative for all of these audiences.

29 We find that protocol and pilot study assignments develop global problem-solving and communication skills in our students. Students are asked to plan ahead as they consider the objectives and desired conclusions. Writing, revising, and following written instructions are all skills that can be incorporated naturally into the assignments.

30 One of the most important advantages of careful planning of data collections and subsequent analyses is that students are encouraged to keep the structure of the data within the boundaries of their statistical expertise. Students tend to be overly ambitious in the planning stages of even simple studies, and they sometimes collect data that is far more complicated than they are able to analyze. A protocol assignment helps to connect data collection to the other course material appropriate for the level of sophistication of the students, especially those at the introductory level.

31 Protocol and pilots study assignments provide students with serious realistic data collection activities, in the spirit of active learning through projects. We have found them to be valuable assignments in our own statistics courses at all levels.


Bradstreet, T. E. (1991), "Some Favorite Data Sets from Early Phases of Drug Research," in Proceedings of the Section on Statistical Education, American Statistical Association, pp. 190-195.

Burrill, G., Burrill, J. C., Coffield, P., Davis, G., de Lange, J., Resnick, D., and Siegel, M. (1992), Data Analysis and Statistics Across the Curriculum, Reston, VA: The National Council of Teachers of Mathematics.

Cobb, G. (1992), "Teaching Statistics," in Heeding the Call for Change: Suggestions for Curricular Action, ed. L. A. Steen, MAA Notes, No. 22, Washington, DC: The Mathematical Association of America, pp. 3-34.

Fillebrown, S. (1994), "Using Projects in an Elementary Statistics Course for Non-Science Majors," Journal of Statistics Education [Online], 2(2). (

Garfield, J. (1994), "Beyond Testing and Grading: Using Assessment to Improve Student Learning," Journal of Statistics Education [Online], 2(1). (

Halvorsen, K. T. and Moore, T. L. (1991), "Motivating, Monitoring, and Evaluating Student Projects," in Proceedings of the Section on Statistical Education, American Statistical Association, pp. 20-25.

Hogg, R. V. (1991), "Statistical Education: Improvements are Badly Needed," The American Statistician, 45, 342-343.

Ledolter, J. (1995), "Projects in Introductory Statistics Courses," The American Statistician, 49, 364-467.

Mackisack, M. (1994), "What is the Use of Experiments Conducted by Statistics Students?" Journal of Statistics Education [Online], 2(1). (

Pocock, S. J. (1983), Clinical Trials: A Practical Approach, Chichester, UK: John Wiley & Sons, Inc.

Wardrop, R. L. (1995), Statistics: Learning in the Presence of Variation, Dubuque, Iowa: Wm. C. Brown, Publishers.

Wooding, W. M. (1994), Planning Pharmaceutical Clinical Trials: Basic Statistical Principles, New York: John Wiley & Sons, Inc.

Thomas H. Short
Department of Mathematical Sciences
Villanova University
Villanova, PA 19085-1699

Joseph G. Pigeon
Department of Mathematical Sciences
Villanova University
Villanova, PA 19085-1699

Return to Table of Contents | Return to the JSE Home Page