This case study covers several exploratory data analysis ideas, the histogram and boxplot, kernel density estimates, the
recently introduced bagplot - a two-dimensional extension of the boxplot - as well as the violin plot, which combines a
boxplot with a density shape plot. We apply these ideas and demonstrate how to interpret the output from these tools in
the context of data on living standards in Vietnam. The level of the presentation is suitable for an upper-level undergraduate
or beginning graduate course in applied statistics. We use data from the Vietnam Living Standards Survey of 1998 (VLSS98)
and from the 2000 Vietnam statistical yearbook, the statistical package Stata, and special programs provided by the authors
who introduced the bagplot and the violin plot.
Key Words: Bagplots; Boxplots; Histograms; Kernal density estimators; Vietnam Living
Standards Surveys, Violin plots.
This paper extends work on the construction of instructional modules that use graphical and simulation techniques for
teaching statistical concepts (Marasinghe, et al. 1996; Iverson and Marasinghe 2001). These modules consist of two
components: a software part and a lesson part. A computer program written in LISP-STAT with a highly interactive user
interface that the instructor and the students can use for exploring various ideas and concepts comprises the software
part. The lesson part is a prototype document providing guidance for instructors for creating their own lessons using
the software module. This includes a description of concepts to be covered, instructions on how to use the module and some
exercises. The regression modules described here are designed to illustrate various concepts associated with regression
model fitting such as the use of residuals and other case diagnostics to check for model adequacy, the assessment of the
effects of transforming the response variable on the regression fit using well-known diagnostic plots and the use of
statistics to measure effects of collinearity on model selection.
Key Words: Active learning; Education, Lisp-Stat, Regression diagnostics, Simulation, Statistics
instruction.
In Bayesian statistics, the choice of the prior distribution is often controversial. Different rules for selecting priors
have been suggested in the literature, which, sometimes, produce priors that are difficult for the students to understand
intuitively. In this article, we use a simple heuristic to illustrate to the students the rather counter-intuitive fact
that flat priors are not necessarily non-informative; and non-informative priors are not necessarily flat.
Key Words: Conjugate priors; Maximum likelihood estimation; Posterior mean.
Unless the sample encompasses a substantial portion of the population, the standard error of an estimator depends on the
size of the sample, but not the size of the population. This is a crucial statistical insight that students find very
counter-intuitive. After trying several ways of convincing students of the validity of this principle, I have finally
found a simple memorable activity that convinces students beyond a reasonable doubt. As a bonus, the data generated by
this activity can be used to illustrate the central limit theorem, confidence intervals, and hypothesis testing.
Key Words: Sample size; Sampling distribution; Standard error.
Statistical thinking is required for good statistical analysis. Among other things, statistical thinking involves
identifying sources of variation. Students in introductory statistics courses seldom recognize that one of the largest
sources of variation may come in the collection and recording of the data. This paper presents some simple exercises
that can be incorporated into any course (not just statistics) to help studnets understand some of the sources of
variation in data collection. Primary attention is paid to operational definitions used in the data collection process.
Key Words: Data collection; Operational definitions.
This study investigated the knowledge base necessary for choosing appropriate statistical techniques in applied research.
In this study, we compared knowledge used by six experts and six novices in two types of statistical tasks. The tasks
were: 1) comparing research scenarios form the perspective of choosing a statistical technique, and 2) direct comparison
of statistical techniques. The framework was based on expert knowledge in inferential statistics using the repertory grid
technique for data collection. A qualitative analysis of data showed that of the three types of expert knowledge, research
design knowledge comprised the biggest portion, with theoretical and procedural knowledge comprising relatively smaller
parts. Little difference was observed between experts and novices in extensiveness of knowledge use, although experts'
knowledge use was found to be more integrated than novices'. Finally, two implications were drawn regarding how to better
teach selection skills in statistics education: (1) statistical techniques should be taught in relation to relevant research
designs, and (2) conceptual connections between statistical techniques should be explicitly taught.
Key Words: Knowledge structure; Selection skills; Statistical expertise; Statistical literacy; Statistical
techniques.
Datasets and Stories
Bacteria are cultured in medical laboratories to identify them so patients can be treated correctly. The tryptone dataset
contains measurements of bacteria counts following the culturing of five strains of Staphylococcus aureus. It also
contains the time of incubation, temperature of incubation and concentration of tryptone, a nutrient. The question is
whether the conditions recommended in the protocols for the culturing of these strains are optimal? The task is to find
the incubation time, temperature and tryptone concentration that optimises the growth of this baterium. This data may be
explored by students at several levels. Graphical methods can be used to investigate the relationship between the
variables. ANOVA can be used with one-way, two-way and factorial models with interactions, to identify significant factors.
Multiple polynomial regression methods can be used to model the data, with optimal conditions estimated by partial differentiation.
Key Words: Analysis of variance; Exploratory data analysis; Interactions; Multiple regression;
Optimisation; Outlier; Polynomial regression.