Teaching Bits: A Resource for Teachers of Statistics

Journal of Statistics Education v.4, n.2 (1996)

Joan B. Garfield
Department of Educational Psychology
University of Minnesota
332 Burton Hall
Minneapolis, MN 55455

J. Laurie Snell
Department of Mathematics and Computing
Dartmouth College
Hanover, NH 03755-1890

This column features "bits" of information sampled from a variety of sources that may be of interest to teachers of statistics. Joan abstracts information from the literature on teaching and learning statistics, while Laurie summarizes articles from the news and other media that may be used with students to provoke discussions or serve as a basis for classroom activities or student projects. We realize that due to limitations in the literature we have access to and time to review, we may overlook some potential articles for this column, and therefore encourage you to send us your reviews and suggestions for abstracts.

From the Literature on Teaching and Learning Statistics

"Class Activities with Student-Generated Data"

by Kay Somers, John Dilendi, and Bettie Smolansky (1996). Mathematics Teacher, 89(2), 105-107.

This article suggests ways data may be gathered in class and used to illustrate simple statistical techniques. An interesting activity involves two memorization tasks -- one where students memorize numbers without a context, and one that embeds numbers within years in which important historical events occurred. Students' scores on the two tasks are compared using different methods of graphing and descriptive statistics. Suggestions are offered for stimulating student discussion of the resulting data and guiding students' understanding by having them explain differences in graphs and measures of center and variability.

"Snap, Crackle, and Pop"

by Elaine B. Hofstetter and Laura A. Sgroi (1996). Mathematics Teaching in the Middle School, 1(9), 760-764.

Although this article describes a data gathering activity for middle school students, it can easily be adapted for high school or college statistics classes. Students are asked to bring empty cereal boxes to class, which students use to gather and analyze data for several variables listed on the boxes, e.g., carbohydrate grams, sugar content, vitamins and minerals. Suggestions are also offered for analyzing price data and for conducting taste tests that involve constructing new variables such as "crunchiness." Even probability may be included, if students are asked to discuss prizes in cereal boxes and are asked to solve the popular problem of how many boxes must be purchased before one is sure to get at least one prize of each type.

"Increasing Student Participation in Large Introductory Statistics Classes"

by Rhonda C. Magel (1996). The American Statistician, 50(1), 51-56.

To make a large statistics course more enjoyable, relevant, and engaging, this instructor decreased the time she spent giving lectures and instead asked students to complete worksheets during class either alone or in groups. Some worksheets required cooperative activities where students collected and analyzed data. The results of this "classroom research" were quite positive. The amount of student interaction increased, as did student learning and student satisfaction. In addition, the percentage of students withdrawing from the course decreased.

"Representing Probabilities with Pipe Diagrams"

by Clifford Konold (1996). Mathematics Teacher, 89(5), 378-382.

This author has developed a modified version of the familiar tree diagrams used to illustrate and solve probability problems. These "pipe diagrams" suggest a system of pipes through which water flows. "An event is represented by a particular pipe in the system and the probability of an event is equivalent to the fraction of the total amount of water entering at the left that flows into a particular section of the pipe." In contrast to tree diagrams, pipe diagrams are a concrete representation, illustrating that the probabilities of all mutually exclusive outcomes of a random event sum to one. This makes sense to students because all the water split among the pipes is obviously equal to the original amount. Some probability problems are used to illustrate the pipe diagram method, and research on students' difficulties in understanding probability is briefly summarized.

The following abstracts appeared in the June 1996 edition of the Newsletter of the International Study Group for Learning Probability and Statistics, edited by Carmen Batanero, University of Granada (e-mail: batanero@goliat.ugr.es).

"The Development and Validation of the Survey of Attitudes Toward Statistics"

by Schau, C., Dauphinee, T. L., and Del Vecchio, A. (1995). Educational and Psychological Measurement, 55, 868-875.

The Survey of Attitudes Toward Statistics (SATS) was designed for use in both research and instruction. A panel of instructors and introductory statistics students identified by consensus four facets of attitudes toward statistics: (a) Affect -- positive and negative feelings concerning statistics; (b) Cognitive Competence -- attitudes about intellectual knowledge and skills when applied to statistics; (c) Value -- attitudes about the usefulness, relevance, and worth of statistics; and (d) Difficulty -- attitudes about the difficulty of statistics as a subject. This structure was validated for a sample of undergraduate students using confirmatory factor analysis. Additional validity evidence was obtained through the correlation of the SATS with Wise's Attitudes Toward Statistics scale, which showed significant, positive relationships between the two instruments.

"Exploring Probability and Statistics With Preservice and Inservice Teachers"

by Quin, R. J. (1996). School Science and Mathematics, 96(5), 255-257.

This article presents a lesson designed for preservice and inservice teachers that permits participants to: a) strengthen their conceptual understanding, and b) experience learning in a cooperative environment that encourages communication. Opportunities are provided for participants to discover patterns and construct mathematical knowledge concerning theoretical probability.

"A Model for Assessing Higher Order Thinking in Statistics"

by Watson, J. M., Collis, K. F., Callingham, R. A., and Moritz, J. B. (1995). Educational Research and Evaluation, 1, 247-275.

As in other areas of the school curriculum, the teaching, learning and assessment of higher order thinking in statistics has become an issue for educators following the appearance of recent curriculum documents in many countries. These documents have included probability and statistics across all years of schooling and have stressed the importance of higher order thinking across all areas of the mathematics curriculum. This paper reports on a pilot project which applied the theoretical framework for cognitive development devised by Biggs and Collis to a higher order task in data handling in order to provide a model of student levels of response. The model will assist teachers, curriculum planners and other researchers interested in increasing levels of performance on more complex tasks. An interview protocol based on a set of 16 data cards was developed, tried with Grade 6 and 9 students, and adapted for group work with two classes of Grade 6 students. The levels and types of cognitive functioning associated with the outcomes achieved by students completing the task in the two contexts is discussed, as well as the implications for classroom teaching and for further research.

"Are Humans Good Intuitive Statisticians After All? Rethinking Some Conclusions From the Literature on Judgment Under Uncertainty"

by Cosmides, L., and Tooby, J. (1996). Cognition, 58, 1-73.

Professional probabilists have long argued over what probability means, with, for example, Bayesians arguing that probability refers to subjective degrees of confidence and frequentists arguing that probability refers to the frequencies of events in the world. Recently, Gigerenzer and his colleagues have argued that these same distinctions are made by untutored subjects, and that, for many domains, the human mind represents probabilistic information as frequencies. We analyze several reasons why, from an ecological and evolutionary perspective, certain classes of problem solving mechanisms in the human mind should be expected to represent probabilistic information as frequencies. Then, using a problem famous in the "heuristics and biases" literature, we show that correct Bayesian reasoning can be elicited in 76% of subjects -- indeed, 92% in the most ecologically valid condition -- simply by expressing the problem in frequentist terms. These results show that frequentist representations cause various cognitive biases to disappear, and that the conclusions most common in the literature on judgment under uncertainty will have to be re-examined.

The Journal of Educational and Behavioral Statistics-- Special Issue on Teaching Statistics

The Journal of Educational and Behavioral Statistics, 1996, 21(1) is dedicated to Teaching Statistics. After an introduction by the Editor, Betsy Jane Becker, the following articles are included:

"Evaluating Statistics Texts Used in Education"

by Michael R. Harwell, Mary Lee Herrick, Deborah Curtis, Daniel Mundfrom, and Karen Gold, pp. 3-34.

Evaluating texts is an important activity associated with teaching statistics. Surprisingly, the statistical education literature offers little guidance on how their evaluations should be conducted. This lack of guidance may be at least partly responsible for the fact that published evaluations of statistics texts almost invariably employ evaluation criteria that lack any theory-based rationale. This failing is typically confounded by a lack of empirical evidence supporting the usefulness of the criteria. This article describes the construction and piloting of instruments for evaluating statistics texts that are grounded in the statistical education and text evaluation literatures. The study is an initial step in a line of research which we hope will result in the establishment and maintenance of a database of evaluations of statistical tests. Evaluative information of this kind should assist instructors wrestling with text selection decisions and individuals charged with performing evaluations, such as journal reviewers, and should ultimately benefit the direct consumers of these texts -- the students.

"Identifying Impediments to Learning Probability and Statistics From an Assessment of Instructional Software"

by Steve Cohen, George Smith, Richard A. Chechile, Glen Burns, and Frank Tsai, pp. 35-54.

A detailed, multisite evaluation of instructional software designed to help students conceptualize introductory probability and statistics yielded patterns of error on several assessment items. Whereas two of the patterns appeared to be consistent with misconceptions associated with deterministic reasoning, other patterns indicated that prior knowledge may cause students to misinterpret certain concepts and displays. Misconceptions included interpreting the y-axis on a histogram as if it were a y-axis in a scatter plot and confusing the values a variable might take on by misinterpreting plots of normal probability distributions. These kinds of misconceptions are especially important to consider in light of the increased emphasis on computing and displays in statistics education.

"A Meta-Analysis of Gender Differences in Applied Statistics Achievement"

by Christine M. Schram, pp. 55-70.

This meta-analysis of gender differences examines statistics achievement in postsecondary-level psychology, education and business courses. Thirteen articles examining 18 samples were obtained and coded for the analysis. The average effect size was -0.08 standard deviation units favoring females; however, the results were heterogeneous. Although no model accounted for all between-studies variation, gender differences could best be predicted from the percentage of undergraduate students in the sample, the department offering the course, and the use of course grade or points for the outcome measure. Undergraduate males showed an advantage over undergraduate females. Univariate tests showed that males also significantly outscored females when the outcome was a series of exams. Conversely, females significantly surpassed males when the outcome was total course performance. Lastly, females outscored males in courses offered by business departments.

"A Look at the Literature (and Other Resources) on Teaching Statistics"

by Betsy Jane Becker, pp. 71-90.

Statistics instructors and others interested in the teaching of statistics will find many print and nonprint resources on this topic. The print literature on the teaching of statistics is largely anecdotal and comprises mainly recommendations for instruction based on the experiences and intuitions of individual instructors. Less than 30% of the print literature reports the results of empirical studies, but these cover a broad range of topics, including the use of computers in statistics instruction, teaching materials, and teaching strategies. A large portion of the nonempirical literature is devoted to descriptions of statistics courses and specific lessons that, though untested, still provide a resource for instruction. Recently numerous nonprint (electronic) resources for instruction, problem solving, and discussions about statistics instruction have also become available. These include many data sets and other instructional resources, statistics discussion groups, and the electronic Journal of Statistics Education.

Teaching Statistics

A regular component of the Teaching Bits Department is a list of articles from Teaching Statistics, an international journal based in England. Brief summaries of the articles are included. In addition to these articles, Teaching Statistics features several regular departments that may be of interest, including Computing Corner, Curriculum Matters, Data Bank, Historical Perspective, Practical Activities, Problem Page, Project Parade, Research Report, Book Reviews, and News and Notes.

The Circulation Manager of Teaching Statistics is Peter Holmes, ph@maths.nott.ac.uk, RSS Centre for Statistical Education, University of Nottingham, Nottingham NG7 2RD, England.

Teaching Statistics, Summer 1996
Volume 18, Number 2

"Statistics and Intuition for the Classroom" by Sangit Chatterjee and James Hawkes

Examples and case studies are presented of statistical reasoning, thinking and intuition that may arise in perception of randomness and in particular for random walks. The relationships between art and science can be explored through various notions of the statistical concept of randomness.

"To Boxplot or Not to Boxplot?" by Gary Kader and Mike Perry

The object of this paper is to point out a misapplication of boxplots, to suggest why the misuse occurs, and to suggest how we might adjust our teaching to correct it.

"Exploring Sampling" by James Nicholson

This article attempts to integrate the crucial ideas of sampling and estimation in the teaching of linear regression.

In addition to these articles, this issue includes the regular departments Practical Activities, Statistics at Work, Computing Corner, Research Report, Apparatus Review, Net Benefits, Software Review and Book Reviews. Included inside the journal is the summer issue of IASE Matters.

Topics for Discussion from Current Newspapers and Journals

"Erasing the Passed"

by Karen Avenoso. The Boston Globe, 14 July 1996, A1.

The Stratfield School in Fairfield, Connecticut, is a prize-winning, prestigious school with 500 kindergartners through fifth graders. In April, the superintendent held a news conference to announce that administrators of the Iowa Test of Basic Skills, a comprehensive national test given annually to many third and fifth graders, had studied the school's 150 answer sheets from January and had observed tampering on many exams.

This claim was based on comparisons of erasures on Stratfield's reading comprehension tests and those from two other high-performing elementary schools in Fairfield. The Stratfield tests had 3.5 to 5 times the number of erasures found on the tests of the other schools. Further, 89% of the Stratfield erasures resulted in a correct answer, compared to fewer than 70% of the changes made at other schools. Houghton Mifflin is the parent company of the tests, and officials from this company stated that such results "clearly and conclusively indicate tampering."

The Stratfield third graders were retested. On the original test, the third graders scored in the 89th percentile of students nationally, while on the retake they dropped to the 79th percentile.

Connecticut's education department studied tests given to the Stratfield fourth graders in 1994 and 1995 and again found more erasures than on tests from other schools; 89% of the erasures gave correct answers.

Parents and teachers remain unconvinced that there is anything wrong with the tests. The Stratfield principal claims there is no evidence of tampering and that the high erasure rate is evidence of the school's emphasis on thorough, precise work.

The matter is being investigated by a former agent of the Federal Bureau of Investigation and a forensic scientist fresh from the O. J. Simpson trial.

"A Church Arson Epidemic? It's Smoke and Mirrors"

by Michael Fumento. Wall Street Journal, 8 July 1996, A8.

The author reviews the sequence of events that led the media to report a near epidemic of burning of black churches. He concludes that there is no evidence of an epidemic.

At the end of March, the Center for Democratic Renewal (CDR) held a press conference and released a preliminary report showing a surge in arsons against black churches beginning in 1990. The report claimed that there had been 90 arsons against black churches in nine Southern states since 1990, and that the number has risen each year, reaching 35 in 1996 as of June 18, and that every culprit arrested or detained has been white.

Fumento reports that, when he tried to verify the CDR claims >from government sources, he found that they did not have adequate records. He was referred to a private group, the National Fire Protection Association, that had records of fires, but they were not broken down by race. Their records show a dramatic decrease in the number of church arsons from 1,420 in 1980 to 520 in 1994.

Fumento contacted law officials in South Carolina, Georgia, Alabama and Mississippi and reports findings very different from the claims of the CDR. For example, South Carolina had the most arsons (27) on the CDR list. Officials reported that seven of the fires were either found not to be arsons or the cause has not been found. Eight of the 18 people arrested in connection with the fires were black. Reports in the other states indicated a wide variety of causes for the fires and no particular pattern.

Another source of data was provided by extensive coverage in USA Today. Fumento suggests that those authors incorrectly interpreted their own data when they stated that "the numbers confirm that a sharp rise in black church arsons started in 1994 and continues." Fumento states that the USA Today charts indicated that two of the states did not start reporting data until 1993, and a third one did not start until 1995. It is not surprising that the totals increased when the states did start reporting.

Looking at just those states that have reported data since 1990, he finds that the number in 1990 (13) was two more than in 1994, and the number in 1991 (16) was the same as in 1995. He suggests that the large number in 1996 was in fact due to copycat crimes caused by the undue publicity given to the CDR report.

In a letter in the July 15th Wall Street Journal, a writer from USA Today points out that, by looking only at those states that have reported data since 1990, Fumento includes information on only six of 11 states. In particular, he leaves out Alabama, North Carolina, and South Carolina, where the heart of the increase lies. It is claimed that complete data is available since 1993; in 1993, there were 19 black church arsons. In 1995 there were 27, and in the first half of 1996, there were 40.

A more detailed analysis of the way the press and the politicians handled this story can be found in an article by Michael Kelly in the July 15, 1996, issue of The New Yorker.

"Sampling Wildlife Populations"

by Bryan F. J. Manly and Lyman L. McDonald (1996). Chance, 9(2), 9-19.

The authors discuss several current conservation problems that involve wildlife populations and the statistical methods used in trying to settle resulting disputes.

These methods are variations of classical methods for population estimation that go back to those used by Graunt in 1662 to estimate the population of London and by Laplace in 1783 to estimate the population of France. Laplace obtained a register of births for the whole country, including a set of parishes for which he knew the number of births b and the population size n. He then argued that the ratio of population size N to births B for the whole country should be approximately the same as the ratio for the parishes, i.e., N/B = n/b. He knew everything but N, so he estimated N by B*n/b. This is an example of an estimation method that has many forms and names; capture-recapture and mark-recapture are two common names used.

The first controversy the authors discuss involves the protection of the northern spotted owl and forest management in the U.S. Pacific Northwest. To help settle this controversy, studies are carried out to see if the death rate of spotted owls is going down as the result of current forest management policies. Here is how one type of study is carried out.

A number of adult spotted owls are captured, tagged, and released. At the end of each year, an attempt is made to recapture these owls. We will call these capture times. At these times some will be recaptured, some will have died, and some will still be alive but will be missed in the recapture process. Let s(i) be the probability that an owl that has survived until the ith capture time survives to the (i+1) capture time, and let p(i) be the probability that an owl that has survived until the ith capture time is captured at that time. The data for each owl is recorded as a sequence of 0's and 1's, where a 1 means the owl was captured and a 0 means that it was not. For example, for a three-year period, the data for a specific owl can be represented by one of the four sequences 100, 101, 110, or 111. Then 101 means the owl survived the first three capture times, was not recaptured at the second capture time but was recaptured at the third capture time. The probability of this sequence is s(1)*(1-p(2))*s(2)*p(3). The product of such probabilities over all the owls gives the probability of obtaining the data observed. The parameters p(i) and s(i) are then chosen to maximize this probability (the maximum likelihood method), giving the desired estimate for the survival rates s(i).

The product of such probabilities over all the owls gives the probability of obtaining the data observed. The parameters p(i) and s(i) are then chosen to maximize this probability (the maximum likelihood method), giving the desired estimate for the survival rates s(i).

The article shows the results of such a study for adult northern owls for the years 1985-1992. There appeared to be a significant negative trend in the survival probabilities for adult females, but not for males.

Another example illustrating a different type of study involves polar bears in Alaska. The study was designed to see how the polar bears are surviving the harvest permitted native Alaskan subsistence hunters. This was an aerial study aimed at estimating the number of polar bears in the southern Beaufort Sea by the "line transect method."

This method is used when it is desired to estimate the animal population in a region where one cannot be sure that one has counted all the animals even in a subregion. Assume that we want to estimate the number of animals in a strip bordered by the two lines a distance w from a line L. A researcher travels along the line L looking for animals. For each animal spotted, he estimates and records the perpendicular distance from the animal to the line L.

If an animal is a distance x from L, we assume that it will be seen with a probability g(x), where g(0) = 1. Assume further that the animals are randomly distributed in the strip. Then the probability that an animal in the strip is observed is the average a of g(x) between 0 and w. Then if N is the number of animals in the strip, and n are observed, E(n) = Na. Thus n/a should be a reasonable estimate for N.

Thus to estimate N, we need only estimate a from our data consisting of x(1), x(2), ..., x(n) -- the perpendicular distances to the line for the animals seen. The x's are independent with common distribution and, again using the fact that the perpendicular distances of the animals are uniformly distributed, we find that the x's have density f(x) = g(x)/a. Because g(1) = 0, w = 1/f(0). Therefore, to estimate w we need only estimate f(0). This can again be done by the maximum likelihood method. For example, it is often assumed that the probability of observing an animal decreases exponentially with the distance to L, i.e., g(x) = e^(-x/m). Then f(x) is the exponential density f(x) = (1/m)*e^(-x/m), and w = 1/f(0) = m, the mean m of x(i). The obvious estimator for m is the average of the x's, which is the maximum likelihood estimate.

This is an excellent article illustrating how statistics is used in current issues. The article is not as technical as our review of it. Reading the article made us curious how these methods really work, so we delved into the excellent references that the article provides.

"Majoring in Money"

by Richard Morin. The Washington Post, 24 March 1996, C5.

Richard Morin writes a weekly column called "Unconventional Wisdom" that appears each Sunday in The Washington Post. In his column Morin reviews current studies in the social sciences. In this particular column he considered a study of special interest to students. This study was reported in an article "Earnings of College Graduates, 1993" by Daniel E. Hecker (Monthly Labor Review, December 1995, pp. 3-17). Hecker, an economist for the Federal Bureau of Labor Statistics, was interested in how people with different college majors fared financially in their future jobs.

In 1993, the National Science Foundation sponsored a survey of a sample of individuals under age 75 who had reported having a bachelor's or higher level degree in the 1990 census. Data including college major and later job information were obtained from 215,000 persons. Hecker considered those who had full time jobs and presented average incomes for men and women with different undergraduate majors for three age groups: young (25-34), midcareer (35-44), and older (45-64). He used this information to find the majors that led to the highest average incomes and those that led to the lowest incomes. The top five and the bottom five majors for men and women are listed at the end of this review. Men who were mathematics majors had the second highest average income in mid-life, beaten only by engineering majors. Hecker also provides a breakdown according to types of jobs the majors actually had. From this one can see that only 9% of the male mathematics majors ended up working in the mathematical sciences.

Hecker also commented on some of the differences between men and women and gave possible explanations for these differences. Looking at the three age groups, we find the average incomes over all job categories for men were $35,694, $43,199 and $49,390. The corresponding figures for women were $29,660, $32,155, and $32,093.

Obvious explanations for the lower salaries for women include sex discrimination, family and lifestyle choices, and the fact that many women choose or are pushed toward majors that lead to less lucrative careers such as social work, home economics, and teaching. The lack of increase for women over the years seems to us more mysterious.

Annual Earnings by College Undergraduate Major --
Men and Women Aged 35-44

Men:  Top Five Majors

1.  Engineering                                    $53,286
2.  Mathematics                                    $51,584
3.  Computer science                               $50,509
4.  Pharmacy                                       $50,480
5.  Physics                                        $50,128

Women:  Top Five Majors

1.  Economics                                      $49,170
2.  Engineering                                    $49,070
3.  Pharmacy                                       $48,427
4.  Architecture                                   $46,353
5.  Computer science                               $43,757

Men:  Bottom Five Majors

1.  Philosophy/Religion                            $31,848
2.  Social work                                    $32,171
3.  Visual and performing arts                     $32,972
4.  Foreign language/linguistics                   $33,780
5.  Education                                      $34,470

Women:  Bottom Five Majors

1.  Philosophy/Religion                            $25,788
2.  Education                                      $27,988
3.  Home economics                                 $28,275
4.  Social work                                    $28,594
5.  Agriculture                                    $28,751

Source: Bureau of Labor Statistics

"Washington News,"

Nature, 30 May 1996, 355.

The National Science Foundation recently published their biennial survey report "Science and Engineering Indicators 1996: On the Statistical Status of U.S. Science."

Jon Miller, the main author of the report's section on public understanding of science, stated that there is no evidence of the much touted anti-science movement. The percentage of people who believe that "the benefits of scientific research have outweighed the harmful results" has been very close to 70% since the survey was started in 1979, with the exception of 1988, when it was a bit over 80%.

However, Miller reported that the public's support of science is not matched by an understanding of how science works. For example, although half the sample agreed that a clinical trial should put 500 subjects on a drug and keep 500 off it -- rather than putting 1,000 people on the drug right away -- almost half of them thought the reason for keeping the 500 on placebo was to save them from the risk of being poisoned.

"Jingle Man Retreads Pieces of Pop Hits in Car-Dealer Ditties"

by Oscar Suris. The Wall Street Journal, 11 June 1996, 1.

John Giaier is a former studio musician who now makes a career writing jingles for ads for car dealers. To cut his costs in this competitive business, he has turned to an interesting application of sampling. Using a database of snippets from songs featured in weekly issues of "Billboard" magazine, his computer links selections into potential jingles -- and even prints out the sheet music! This approach allows Giaier's firm to turn out about five successful "new" jingles a week. The resulting tunes apparently contain enough familiar notes and chords to have listeners readily humming them, but they are not so close to published songs that Giaier has to pay royalties.

John Lofrumento, Chief Executive Officer of the American Society of Composers, Authors and Publishers (ASCAP), believes the approach may violate copyright laws. Defending his firm's practices, Giaier says he learned from his experience in the music industry that "anything you write musically has been written before."

"The Tipping Point"

by Malcolm Gladwell. The New Yorker, 3 June 1996, 32-38.

The crime rate in New York City has dropped dramatically in the last few years. There have been a number of attempts to explain this decrease. William J. Bratton, who was New York Police Commissioner during most of the decline, argues that new policing strategies made the difference. Criminologists put forward more general demographic and social explanations -- the aging of the population, the stabilization of the crack trade, longer prison terms, and so on. However, the scale of these changes seems too small to fit the scale of the changes that have occurred in the crime rates in New York. This has suggested looking for a new explanation. This article reports that researchers are turning to the theory of epidemics to explain changes in crime rates.

The mathematical theory of epidemics was developed to study the spread of a disease through a population -- such as the spread of AIDS. A simple model for epidemics can be described as follows: Let N be the population size, assumed to be constant. Let S(t) be the number of people at time t who are susceptible to getting the disease, let I(t) be the number infected at time t, and let R(t) be the number removed by time t either by already having had the disease, having died, or becoming immune. It is assumed that the rate of change of number of susceptibles, S(t), is proportional to the number of contacts between the infected and the susceptible people and that the rate of change of the number removed, R(t), is proportional to the number infected. Thus,

S'(t) = -bSI, b > 0, and

R'(t) = rI, r > 0,

where b is the infection rate and r the recovery rate. Because

S(t) + I(t) + R(t) = N,

we have

S'(t) + I'(t) + R'(t) = 0, and

I'(t) = bSI - rI = (bS-r)I.

From this last equation we see that if S < r/b, the number of infected decreases and, in fact, the model predicts it will decrease to 0. If S > r/b, the number infected increases, resulting in an epidemic. Solving the equations shows that the number infected increases to a maximum value and then decreases to 0. The number of susceptibles decreases to a limiting value > 0, so there will always be some people not affected by the disease.

This model suggests intervening to try to make S < r/b by decreasing S, increasing r, or decreasing b. The number of susceptibles S could be decreased, for example, by vaccination.

In applying this epidemic model to the problem of crime rates, one might interpret the susceptible people as those who might become criminals, the infected as those who become criminals, and the removed as those who end up in jail or die. The model predicts a threshold effect (called tipping in this article). If the number susceptible to becoming criminals is small enough, the crime rate will decrease, but, if it is greater than the critical value r/b, the crime rate will increase significantly. One could try to prevent a crime epidemic by intervening. The number of susceptibles S could be decreased by providing more jobs, the removal rate r could be increased by giving longer jail terms, and the threat of longer jail sentences might also decrease the infection rate b.

Of course the New Yorker article does not have all of this mathematics and does not really explain what a mathematical model is or how it might be used. However, the author mentions specific researchers who have used nonlinear models to explain social phenomena. One of the most interesting is described in an early paper by Thomas Schelling (Journal of Mathematical Sociology, 1971, 1(1), 143-186).

Schelling wanted to try to understand how people of different races segregate themselves when they have a choice of where to live. He discusses his model in terms of the movement in and out of a neighborhood with blacks and whites, but his ideas apply just as well to segregation of men and women, students and faculty, young and old, and other dichotomies.

He illustrates his models in terms of very simple experiments that the reader is invited to carry out. Start with a rectangle divided into small squares like a checkerboard. Randomly choose about 25% of the squares to remain blank and randomly distribute black and white counters on the rest of the squares. The counters represent people living on the squares. Consider the eight adjacent squares to be the person's neighbors. Assume that if a person sees that less than half of his neighbors are of the same color, he desires to move. If he moves, he will move to the nearest empty square where he is satisfied, i.e., where at least 50% of the neighbors are his color. Now choose a method for deciding on the order in which people are allowed to move and let them move until everyone is satisfied. See what kinds of patterns arise.

Schelling shows what happens in such a model as you change some of the rules. For example, people of one color might want to move if even 20% of the neighbors are of a different color.

Schelling applies his model to try to explain the observed phenomena of "white flight" from neighborhoods in the 1950's. The percentage of a minority that would cause this flight was called "the tipping point," and about 20% was considered the tipping point for most whites.

"Clean Air Regulations Are Paying Off, EPA Says"

by Gary Lee. Washington Post, 10 June 1996, A17.

The Environmental Protection Agency (EPA) has been under attack by Republican lawmakers for spending too much of the taxpayers' money without much to show for it. The EPA carried out an analysis to weigh the costs against the benefits of the enforcement of the 1970 Clean Air Act. A draft of the study has been made available to lawmakers and the press. The report found that "in 1990, Americans received roughly 20 dollars in value in reduced costs, risks of death, illness, and other adverse effects for every one dollar spent to control air pollution."

The Clean Air Act required industry to reduce and control emissions of sulfur dioxide, nitrogen oxide, ozone, particulate matter, carbon monoxide, and lead.

The authors of the report estimated that, over the 20-year period reviewed, industry and private individuals paid over $436 billion in anti-pollution technology and increased costs for goods. The study expressed amounts in 1990 dollars.

The authors calculated the effect of the reductions in pollutants on public health. They estimated that over this period the decreased pollution reduced the number of heart attacks by 18,000, the number of strokes by 13,000, the cases of respiratory illnesses by 15,000, and of hypertension by 16,000.

They estimated that each death due to air quality problems costs $4.8 million, each heart attack $587,000 and each hospital admission for respiratory problems $7,500. Each workday lost due to air quality problems costs $83, and each IQ point lost due to lead poisoning or other air pollutants costs $5,550. Taking all this into account led to an estimate of $6.8 trillion in benefits over the 20-year period.

"Census Plan for 2000 Is Challenged on Two Fronts"

by Steven A. Holmes. The New York Times, 6 June 1996, A21.

Both minority groups and Republicans are challenging the Census Bureau's plans for counting the population in the year 2000. Minority groups argue that the new plan would worsen the problem of undercounting blacks and Hispanics, while Republicans counter that the new technique would result in improperly drawn legislative districts.

The Census Bureau plans to count at least 90% of the households in each county and then use statistical sampling methods to estimate the number missed. Representative Carrie P. Meek, a Democrat from Florida, introduced a bill with an alternative method for sampling. It would require the Census Bureau to count 90% of the households in each census tract, a geographical unit that is much smaller and more ethnically homogeneous than a county, and then use statistical sampling methods to estimate the missed population within the tract. She argues that a census taker, counting in a county with a minority-dominated city and predominantly white suburbs, might reach 90% by counting all the white suburbanites and a lower percentage of the minority inner-city dwellers. A greater chance of error in estimating the missed minority population would occur.

On the other hand, representative Tom Petri, a Republican >from Wisconsin, introduced a plan that would prohibit the Census Bureau from applying sampling techniques of any kind to determine the population count. Mr. Petri says the Census Bureau should try to obtain an accurate count without sampling.

The Census Bureau acknowledges that its 1990 count left many Americans uncounted, especially members of minority groups who lived in central cities. The number of blacks uncounted was six times greater than the number of uncounted whites, the largest undercount differential since 1940.

"Mr. Statistics" in "Keeping Up"

by Daniel Seligman. Fortune Magazine, 10 June 1996, 161.

Mr. Statistics responds to a reader about a bet he made with Warren Buffet (who I gather is a contender for the country's richest man). He bet that Rodger Maris' record of 61 home runs would be broken this year. The reader wanted to know if the tax implications should affect the odds. Mr. Statistics says that if you have to report your winnings, then it does affect the odds and explains how this would work if, for example, you play roulette.

The bet with Mr. Buffet was made on April 30, and Buffet gave Mr. Statistics 4 to 1 odds, betting that no player would hit 62 home runs this season. In a previous column (Fortune Magazine, 13 May 1996), Mr. Statistics calculated that there was an 18.4% chance that one of seven players -- Albert Belle, Barry Bonds, Matt Williams, Frank Thomas, Ken Griffey, Jr., Dante Bichette, or Mark McGwire -- would get 62 home runs this year.

Using performance data from the past two seasons, he calculated for each player the expected number of times at bat and the probability of getting a home run when at bat. He then used the binomial distribution to find the probability of 62 or more successes with n = the expected number of times at bat and p = the player's probability of getting a home run when at bat.

Mr. Statistics found that Cleveland player Albert Belle had the best chance of breaking the record. He estimated that Belle had, at the beginning of the season, an 8.98% chance of hitting a home run when at bat and an expected number of times at bat of 604. Using the binomial distribution with n = 604 and p = .0898, he found the probability of Belle's breaking the record to be .15. By the same method he found that the next best bet is Frank Thomas of the White Sox, who has a 3% chance of breaking the record. Adding these probabilities for the seven players gave him the estimated chance of 18.4% that one of the seven would break the record. This corresponds to odds of 4.4 to 1. However, the bet was that someone would break the record, so maybe 4 to 1 is about right.

"Three Coins in the Fountain"

in "Fallacies, Flaws and Flimflam," by Ed Barbeau (ed.). The College Mathematics Journal, May 1996, 27(3), 204.

When three fair coins are tossed independently, what is the chance that they all come up alike? This article presents a paradoxical solution attributed to Sir Francis Galton, which will be discussed by Ruma Falk in a forthcoming article in Teaching Statistics. Galton gave the correct answer of 1/4, noting that HHH and TTT are the two favorable outcomes among the eight equally likely possibilities. But he went on to say that he had heard the following proposed solution: "At least two coins must turn up alike, and it is an even chance whether the third coin is heads or tails; therefore the chance of all alike must be 1/2."

"Fly Me; Why No Airline Brags: `We're the Safest'"

by Adam Bryant. The New York Times, 9 June 1996, Sec. 4, p. 1.

Airline officials try to tell the public that if they allow a U.S. airline to fly, then the airline is considered safe. However, the public continues to want to know how safe it is to fly. Arnold Barnett at the Massachusetts Institute of Technology has studied air safety statistics for 20 years. He has said that if one boarded a flight on a randomly chosen airline every day, on average one would fly for about 21,000 years before dying in a crash. However, travelers' suspicions that the Federal Aviation Association (FAA) had a fuller answer were confirmed when an internal ranking of safety records of airlines was obtained by the Chicago Tribune and then released to the rest of the press by the FAA. Here is the part of the ranking relating to the major U.S. airlines.

The FAA chart below counts an accident as any incident that results in serious personal injury or substantial damage to the plane. It covers the period from 1990 through May 1996.

Accidents per 100,000 departures for selected airlines

                    Accident rate  Departures

Tower Air                   8.680      23,041
Valujet                     4.228     118,264
Carnival                    2.740      72,988
American Trans Air           .562     177,662
United                       .452   4,651,020
Continental                  .349   3,150,349
American                     .338   5,622,932
Delta                        .308   6,162,160
USAir                        .242   5,782,745
America West                 .232   1,281,205
TWA                          .227   1,763,784
Southwest Airlines           .217   3,233,102
Northwest                    .173   3,477,273
Alaska                       .126     796,495

Return to Table of Contents | Return to the JSE Home Page