Mervyn Marasinghe

Iowa State University

William M. Duckworth

Iowa State University

Tae-Sung Shin

StatSoft Inc.

Journal of Statistics Education Volume 12, Number 2 (2004), jse.amstat.org/v12n2/marasinghe.html

Copyright © 2004 by Mervyn Marasinghe, William M. Duckworth and Tae-Sung Shin, all rights reserved. This text may be freely shared among individuals, but itmay not be republished in any medium without express written consent from the authors and advance notification of the editor.

**Key Words:** Active learning; Education; Lisp-Stat; Regression diagnostics; Simulation; Statistics
instruction.

A primary goal of a new NSF-supported project targeted at improving the effectiveness of undergraduate statistics courses
has been to develop instructional tools that are easily adaptable for general use independent of specific courses. These
tools will provide statistics instructors with the capability of supplementing their teaching methods by presenting and
illustrating statistical concepts more effectively than is possible using conventional instructional methods. They allow
exploration of important concepts using graphical displays that are easy to use and provide instantaneous visual feedback,
thus encouraging *active* learning.

Under this project, we have developed a number of instructional modules designed to illustrate various statistical concepts
and provide important insights into the application of these concepts. Use of these modules in teaching will enable
students to get more meaningful learning experiences than are otherwise possible from traditional instructional methods
alone. Exercises designed to accompany these modules will further reinforce these learning experiences. The *software
component* of these modules employs a combination of state-of-the-art computing hardware, a statistical programming
environment, high resolution color graphics, computer simulation, and a highly interactive user interface. The modules
can be used both as classroom demonstration tools and as self-paced exercises for individual students or small groups.
The *instructional component* is a prototype lesson that includes a description of concepts to be covered,
instructions on how to use the module, and some exercises with solutions. Course coordinators and instructors can modify
or augment these lesson plans to create their own class assignments.

The software components of the modules described here have been programmed in the Lisp-Stat language. The system provides a computing environment with the following important features:

- It is a powerful object-oriented programming language that allows rapid prototyping and development.
- It allows development of modules using true dynamic graphics.
- It allows links to routines written in Fortran or C.
- It can be installed on the different instructional computing platforms in use at many educational institutions (Unix
workstations, Macintoshes, and Desktop PC's running Windows software).
- It has a large core of statistical functions available.

Another important advantage of using Lisp-Stat for developing software is that, for noncommercial applications, it is available without cost. For a good introduction to Lisp-Stat and other Lisp-Stat based software, see de Leeuw (1994).

We have developed user interfaces that are consistent across modules. Students need only execute one command to start a module. All further interaction is through a mouse-driven interface. One may click on a push button to initiate action, click and drag on a menu button to select items from a pull-down menu or click and move slide-bars to set values of various parameters. Some of the dynamic graphical techniques used in the modules are discussed in detail in Cook and Weisberg (1994).

- An introduction to and a short description of the important statistical concept(s) to be covered.
- Objectives for the instructional module.
- Instructions on how to execute the software component of the module.
- Warm-up exercises to familiarize the student with the software module.
- Formal exercises and questions requiring execution of the software(usually following a reasonably precise set of
instructions), careful thought, interpretation of results, and explanation of conclusions.
- Notes for the instructor, including comments on what is expected to be understood from doing the formal exercises.

However, until recently, little use of these techniques has been made for instructional purposes though the possibilities are many. Saunders (1986) describes the use of dynamic graphics to produce "moving visualizations designed to introduce difficult concepts, reinforce mathematical ideas, and explore the techniques of probability modeling." The visualizations were used in a distance-learning television program produced by the BBC. The paper, for example, describes visualizations to illustrate the effect that parameter changes have on binomial, Poisson and bivariate normal distributions.

Another example is the STEPS project, a UK consortium based in Glasgow involving nine departments in seven universities, which was conceived for the purpose of developing problem-based teaching and learning materials for statistics. Modules were developed around specific problems in several subject areas and incorporated computational and graphical tools to assist in the exploration of the statistical ideas encountered in solving these problems. The problem-based approach was used to motivate interest in the students by selecting problems in their own areas of study and also because it allowed integration with other more standard laboratory materials. Trumbo (1994) uses graphics and simulation to illustrate elementary probability concepts, using programs written in QuickBasic and giving rough equivalents in Minitab.

Even fewer examples appear in the area of regression where the techniques and concepts are ideally suited for illustration using dynamic graphics. Apart from the introductory examples in Tierney (1991) and the work of Cook and Weisberg (1994), a paper that describes software constructed for this purpose is Nurhonen and Nurhonen and Puntanen (1992).

Anderson and Dayton (1995) present program code written to demonstrate various features available in the Lisp-Stat language. Rather than providing a complete set of educational modules, they illustrate how this language can be adapted for building instructional tools to enhance the teaching of various concepts in regression. That approach may not be suitable for instructors averse to learning the Lisp-Stat language at the required level for developing modules. On the other hand, no programming ability is required to use the modules presented in this paper. In addition, the lesson plans provided serve as templates for instructors for creating their own lessons. Users of these modules may find different ways to use them other than those suggested in the paper or the lessons. The present modules are also extensible in that anyone may add a module to the system without affecting current users of the modules.

A commercially available multimedia package ActivStats written by Paul Velleman contains several interactive modules that illustrate various concepts in regression. However, these modules are built around an introductory statistics course, and instructors may find integrating ActivStats with a regression course difficult. Also, ActivStats is not free.

There are several archives of JAVA applets available for regression. An example is the VESTAC system at www.kuleuven.ac.be/ucs/java/, described in Darius, Michiels, Raeymaekers, Ottoy, and Thas (2002). While some of these JAVA applets may be useful for demonstrating the same regression concepts covered in this paper, no lessons are provided with the applets. Using the JAVA applets requires a network connection to access the applets. This can result in slower execution and a less responsive interface than locally installed modules. Modules described in this paper are based on Lisp-Stat and can be customized using the Lisp-Stat language; however, the JAVA applets cannot be customized.

A more recent addition to the literature is the Cook and Weisberg (1999)
regression text accompanied by the software package Arc, also written in Lisp-Stat. Although designed specifically for
performing analyses described in the book, *Arc* could conceivably be used independently to demonstrate selected
regression concepts. A more useful suggestion for instructors using the above text and Arc, is to use the modules described
in this paper as a supplement to their course.

Section 3 describes how to obtain the software described in this section. Readers may find it helpful to install and use the software while following the descriptions in Sections 2.2 through 2.6.

Statistics students from different disciplines often have various levels of ability in mastering some of the more sophisticated modern regression techniques they learn from textbooks and lectures. Interpretation of results produced by standard software packages such as case deletion statistics or residual plots is a complex task and can lead to confusion and improper use of such statistics. An adequate understanding of the concepts behind these techniques and some experience in using them can help alleviate this problem. However, students enrolled in regression courses cannot acquire the necessary experience through involvement in real statistical data analysis projects alone because these projects are often too time consuming to incorporate more than a few into a course.

Some examples of the kinds of concepts that students need to understand for developing an ability to interpret regression computations are:

- Different graphical displays highlight different relationships among variables. To explain the relationship between a
Y and an X variable in a multiple regression model, both Y and X must be adjusted for effects of other explanatory
variables in the model.
- Interpretation of case statistics from a regression analysis is often far from straightforward. Not only does the
presence of more than one extreme observation tend to complicate their interpretation, different case statistics provide
fundamentally different types of diagnostic information.
- If diagnostic plots indicate departures from assumptions (e.g. nonnormality of residuals, nonhomogeneous errors, etc.),
a transformation of either the Y or the X variable (or both) may result in variables that more closely match the model
assumptions.

While standard homework assignments, lab exercises, and class projects provide mechanical practice, illustrate a few ideas,
and help to introduce these concepts to students, true *understanding* comes only after extensive experience.
Repeated interaction with graphical displays of simulation results allows students to be exposed to pseudo-experiences that
will facilitate their understanding of important concepts. For example, by plotting straight lines fitted to regression
data generated from a known fixed model repeatedly, the student can understand the difference between fitted models and the
*true* model. In another example, students can dynamically change the value of a data point to study the effect of
that point on the fitted regression line. By controlling the way that a change is effected (e.g., by holding X fixed and
changing Y in simple linear regression), the program can highlight a single property of a case statistic (such as the simple
fact that leverages depend only on the predictors). Examples like these illustrate the potential benefit of incorporating
modules like ours into a traditional regression course.

The **regteach1** module is useful for illustrating some of the fundamental concepts related to simple linear regression.
For example:

- A single summary statistic like a correlation coefficient or R
^{2}, by itself, cannot be used to interpret the strength of a relationship. A scatterplot is an essential component of examining the relationship between two variables. - It is important to understand the idea of least squares fitting. It can be demonstrated that one may not always be
minimizing the sum of squared deviations when "fitting a line by eye".
- Magnitudes of the residuals from a regression depend on the fitted line. Thus a simple residual plot can reveal a lot
about the goodness of the fit.

The frame on the left-hand side of Figure 1 shows the initial view of the
**regteach1** module window. Changing the slope and intercept values in the slide-bars will dynamically change the
slope and intercept of the plotted line and update the numerical coefficients in the box above the plot. Pushing the
**Select Data** menu button and using the resulting pull-down menu allows the user to select a data set from a list of
simple regression data sets. The frame on the right-hand side of Figure 1 shows
the **regteach1** module after the **OAK-SEEDLING** data set has been
selected.

Frame 1 | Frame 2 |

Figure 1. Two Frames of the

Pushing the **Residuals** button will produce an additional window containing a plot of residuals from the current
fitted line. The signed deviations are displayed as line segments drawn from a zero baseline, plotted at the corresponding
x-values. The residuals will change as the slope and intercept are changed in the main window. This can be used to
demonstrate dynamically the dependence of the magnitudes of the residuals on the fitted line (such as the fact that as the
fit improves the residuals get smaller), as well as identifying various patterns in the residuals (such as curvature) for
diagnosing the fit. Figure 2 shows the residual plots corresponding to two different
lines fitted to a data set.

Line 1 | Line 2 |

Figure 2:

The user can attempt to find the best fitting line to the selected data "by eye" (graphically) using the sliders to change
the slope and intercept; the plotted line will be updated dynamically. When satisfied with the line fitted "by eye", the
user can push the **Fit Least Squares** button to display the least squares line fitted exactly to the data and the
resulting summary statistics (including regression coefficients, sum of squared residuals, and the residual standard
deviation) as shown in Figure 3. Comparing the least squares regression line to
several different "by eye" fits provides the user the opportunity to learn the meaning of the least squares criterion for
fitting a line to data. The **Fit Least Squares** button also creates a scatterplot of the residuals plotted against
the X variable for making a visual assessment of the fit (e.g., curvature exhibited by the residuals may indicate
nonlinearity in the data). The **Correlation** button will display the between X and Y. In addition to a graphical fit,
a user may also attempt to fit the best line numerically, by attempting to minimize the value of the RSS, or better, by
using the RSS "thermometer" displayed on the right margin of the primary window, both activated by pressing the
**Residual SS** button.

Figure 3

Figure 3:

The **regteach2** module is constructed to illustrate some of the concepts related to statistical properties of
parameter estimates in the simple linear regression model. In particular, students will have an opportunity to examine the
sampling distribution of the slope estimate dynamically. Some of the ideas that can be explored with this module are:

- The fitted line passes through the "center" of the data, i.e., through the point
.
- Variability in the data affects the accuracy of estimation of the regression coefficients.
- The estimate of the regression slope is symmetrically distributed around the true value of the slope.
- Parameter estimates depend on the data only through a few summary statistics.

If it is assumed that the errors are a random sample from N(0, ), the slope estimate b is distributed normally with mean , the true value of the slope, and variance . Thus it is easy to see that for fixed values of the regressor variables, the distribution of the estimate depends only on .

The initial window of the **regteach2** module is shown in the top frame of Figure 4.
A value for the variance is first selected for generating data
from the model (displayed in the box near the top margin of the plot) by moving the slide-bar. For each push of the
**Simulate Lines** button, seven new y values corresponding to the seven x values specified are generated, and the
straight line fitted to the data is plotted. By repeating this procedure one can observe dynamically the variation in the
lines fitted to different data sets generated from the same model. The slopes of fitted regression lines constitute a draw
from the sampling distribution of the slope estimate b. By pushing **Dist. of b** prior to starting the simulation
process, the user can observe these values being accumulated in a dynamically updated histogram. As the histogram is
updated with more and more samples of the estimated slope, the user is able to visualize that the distribution begins to
take the appearance of a Normal distribution centered at 2, the true value of the slope.

Figure 4

Figure 4:

The **Save 15 Lines** button, when pushed, will plot 15 such lines on the same graph for a fixed value of
and save a copy of the graph in a separate, smaller window. This
is useful for comparing sets of lines obtained for different choices of ,
as exhibited in Figure 5. These graphs clearly demonstrate that the variation in
the slope of the lines fitted to data is dependent on the actual variability in the data.

Figure 5

Figure 5:

Many regression programs in standard statistical packages produce a large number of the regression diagnostics proposed in the literature. However, not much attention is paid to describing or indicating how such statistics should be used or interpreted. Rather than providing students with additional tools for judging the adequacy of a fitted model, this vast array of statistics has added to the confusion of many students.

The **regteach3** module has been designed to illustrate important concepts related to the use of some of the more
popular case statistics in regression analysis:

- The leverages depend only on the explanatory variables. Cases away from the centroid of the data have large leverages
compared to those near the middle.
- The predicted response for cases with high leverages is largely determined by the observed response. Hence the fitted
line is constrained to pass as close to the corresponding data point as possible.
- Deletion of a single point can have a large effect on the fit of a model. A case is determined to be influential if
its deletion substantially affects the parameter estimates.
- Cook's D is a combination of an outlier measure and a leverage measure. An influential observation (i.e., one with a
large D value) that has small leverage is an "important outlier."

When **regteach3** is started-up, the main window (top left frame in Figure 6)
shows a scatterplot of a working data set along with an overlaid regression line (displayed in magenta; the regression
statistics corresponding to the fitted line are also displayed in the same color). The three secondary windows display,
clockwise from the main window, indexed plots of the studentized residuals, Cook's D statistics, and the leverages,
respectively, computed with respect to the fitted regression line. Initially, all four windows of the module are in
*selecting mode*; i.e., a point may be selected by clicking the mouse pointer near it. By pushing the **Point
Moving Mode** button, the main window can be changed to the *point-moving* mode. In addition, all four plots are
*linked* with each other, so that if a point is *selected* in one of the secondary windows, the serial index
corresponding to the point will identify the point in *all* four windows.

Figure 6

Figure 6:

In the point-moving mode, points in the scatterplot can be dragged into other positions; however, in **regteach3** they
are constrained to move only in the vertical direction, so that the value of the X coordinate of the point moved remains
fixed. To demonstrate a case statistic like *leverage*, we will first move to a secondary window and *select*
one of the points indexed, say, 18 or 19. The selected point will be highlighted and labelled. These points are furthest
from the "centroid" of the x values (the centroid here is the mean , correspond to the
two largest leverages, and lie above the 2p/n reference line shown on the leverage plot. Now, moving to the main window,
the selected point is dragged to a new location (i.e., the point will have a new Y coordinate while the corresponding x
coordinate remains the same). The regression line will be recomputed and plotted in a new position. The original line will
still appear on the graph (in a light green color) for comparing with the refitted line. Notice that this action leaves the
leverage plot unchanged, while the other two plots, as well as the estimates of the parameters, are all updated dynamically
to correspond to the line fitted to the modified data. The **Restore All** button is now used to restore the plots to
their original appearance. Then, a data point closer to the centroid of the x values, say, point 9, is selected similarly
to demonstrate that such cases have comparatively smaller leverage values.

Going back to point 18 or 19, notice that when one of these is moved, the fitted line attempts to "track" its movement more
closely than, say, when point 9 is moved. This can be observed either directly, by following the movement of the line while
the point is being dragged, or indirectly, by noting the changes in the plot of residuals or in the parameter estimates.
This effectively demonstrates that leverage measures the *weight* each point carries in the prediction calculations.
Thus, when the leverage of an observed point is large, the corresponding prediction for that point attempts to move closer
to or "track" the observed point. This forces the fitted line to pass closer to points with higher leverage.

Point 18 is deleted by first selecting it and then pushing the **Delete Selection** button. This will remove point 18,
and straight line model is refitted to the resulting data. The change in the fit statistics is noted and the point replaced
by pushing the **Restore All** button. Figure 6 and
Figure 7 display the results of these operations. By repeating the same procedure
with point 19, it will be revealed that fitted values are more sensitive to the deletion of case 19 than of case 18. Case
19 is said to be a more *influential* observation than case 18. This fact is reflected in the Cook's D statistics
computed for the original data and shown in Figure 6; the largest value corresponds
to case 19. Cook's D statistic is a measure of the influence an individual case has on the regression fit.

Figure 7

Figure 7:

Plotting Y against each of the explanatory variables and computing correlations among them are acceptable as simple tools for studying the relationships among these variables. For example, these may be useful in detecting collinearities involving a pair of variables. However, if incorrectly interpreted, these correlations alone may lead to misleading conclusions regarding the contribution of an explanatory variable to a regression model, particularly in the presence of other variables in the model. Moreover, in multiple regression, collinearities can involve three or more variables. To detect these relationships, simultaneous use of several diagnostic plots may be necessary.

Some of the important concepts related to understanding and interpreting relationships among regression variables that can
be illustrated using the **regteach4** module are:

- A plot of Y against an X variable, called the
*partial response plot*, cannot be used alone to explain the contribution of the X variable to the multiple regression model. - A useful way to display the strength of the relationship between Y and an X variable in a model is to plot these
against each other after removing the linear effects of the other explanatory variables from each. This plot is called
the
*added-variable plot*or the*partial regression plot*. - Added-variable plots are useful in directly determining how individual cases affect the estimation of the corresponding
regression parameters.
- Finding the best linear fit is difficult when the predictors are highly correlated. This is reflected in high
*variance inflation factor*s (VIFs) for some parameter estimates in the fitted model.

Figure 8 shows a (Y,X1,X2) spin-plot with buttons labelled **Pitch**, **Roll**,
and **Yaw** which can be used to rotate the 3-dimensional point cloud around any one of three fixed axes. Using the
**Yaw** button, the point cloud can be rotated around the Y axis until the strongest fit to a straight line is observed
on the plane of the computer screen. This corresponds to the fitting of a least squares plane by eye, and the line is the
2-dimensional view of that plane.

Frame 1 | Frame 2 |

Figure 8: Two Frames of the

In **regteach4** a 2- or a 3-variable regression model with predetermined values for the X variables may be selected by
pressing the corresponding button to illustrate concepts described above. The user has some control over how the data for
the X variables are generated; the correlation between the X1 and X2 variables can be specified using the **Corr**
slider-bar. For example, in Figure 8 the correlation has been set to 0.40 as can be
observed in the scatterplot of X1 vs. X2. The symbols V, H, and O identify the variables plotted on each axis. When the
**3-Var** button is pushed, the user is given the choice of selecting the pair of independent variables to be plotted on
the H and O axes by pushing one of the buttons X1&X2, X2&X3, or X1&X3.

The spin-plot gives the user the ability to observe various 2-dimensional projections of the data, as demonstrated in
Figure 8. In particular, the more interesting projections are those that will
coincide with the partial response plots. For instance, the projection shown on the spin-plot in
Figure 9 is identical to the plot of Y vs. X1 shown on the the upper left corner
of the *scatterplot matrix*. The scatterplot matrix is used in the **regteach4** module to display the relevant
partial response plots simultaneously. Using the **Yaw** button the spin-plot can be rotated to obtain another
projection that is identical to the plot of Y vs. X2, i.e., the middle plot in the top row of the scatterplot matrix.

Figure 9

Figure 9: Two Frames of the

In addition to the partial response plots, the scatterplot matrix also displays all pairwise plots among the X-variables.
Pressing the **Scatterplot Matrix** button produces this plot, displayed here in Figures 9
through 12. Pressing the **Added Variable** button results in the appearance of the added-variable plots shown in
Figures 10, 11 and 12.

Figure 10

Figure 10:

Figures 10 and 11 show examples where the bivariate plots described above have been constructed for samples in which predictors X1 and X2 were generated with correlations of 0.04 (inducing low multicollinearity) and 0.96 (inducing high multicollinearity), respectively. The partial response plots in Figure 10 indicate a strong linear relationship between Y and X2 and a weak linear relationship between Y and X1. However, the added-variable plots show that each of these relationships are strongly linear when the linear effect of the other variable is removed from the regression. Conversely, as evident from Figure 11, lack of a linear relationship in the added-variable plot does not imply that the corresponding (Y,X) variables are independent; rather it could be that in the presence of the other variables, the X variable plotted does not provide any additional explanatory power to the fitted regression model.

Figure 11

Figure 11:

Figure 12 displays an example of a 3-variable model fitted to data where the
variables X1 and X2 are generated to be highly correlated with each other. By examining the added-variable plots it
becomes evident that X1 does not contribute additional predictive information to the regression in the presence of X2 and
X3 in the model. No linear trend is apparent in the first added-variable plot in Figure 12
showing very clearly that fitting a model with both X1 and X2 in the model will cause either or both estimated coefficients
to possess a large sampling variance. The VIFs for the coefficients b_{1} and b_{2} are relatively large
indicating the usefulness of the VIF as a direct measure of the effect of multicollinearity in model estimation. The VIFs
are displayed when the **Regression Stat** button is selected.

Figure 12

Figure 12:

Two conditions in regression are that the mean response is a linear function of unknown regression coefficients and that
the errors are additive and are a random sample from a normal distribution with constant standard deviation. Graphical
tools that help to check these conditions using the fitted model are generally known as *residual plots*. To check
the normality condition one would use a normal probability plot of either the raw residuals or studentized residuals. For
checking homogeneity of error variances, plots of residuals against the predicted values and residuals against each
explanatory variable (X_{i}) are useful. When patterns in these plots indicate deviations from model conditions,
one remedy often advocated is to attempt transforming (or re-expressing) the variables to better satisfy these conditions.
Transforming the response variable using the Box-Cox power family of transformations is often recommended to restore
normality. More traditional transformations that are designed to achieve constant error variances also promote normality
in some instances.

The **regteach5** module has been constructed to illustrate some of the important concepts related to the use of
residual plots and transformations in regression analysis:

- A normal probability plot of the residuals can be used to identify features of the shape of their distribution such as
skewness and whether it is long-tailed or short-tailed.
- Certain patterns in the plot of residuals against the predicted values can be used to identify the form of dependence of
the error variance on the mean of Y. This plot will be called the
*residual plot*below. - Variance stabilizing transformations achieve more nearly constant error variances. These may also help restore
normality to the data in some instances.
- The Box-Cox power transformation can also be used to transform Y so that the transformed data may be adequately
described by a normal distribution.

Figure 13 shows the start-up window of the **regteach5** module. In the example
displayed, the **Select Data Set** button has been used to select a data set and the **Normal Plot** button employed
to obtain the corresponding normal probability plot.

Figure 13

Figure 13:

The start-up window also contains a **Simulation** button that can be used to generate *data simulation windows*:
a window to generate simulated data and a window to display the corresponding normal probability plot (see
Figure 14). While in this set-up, the module is said to be in the *simulation
mode*. In this mode, random samples may be drawn from distributions with various shapes. By observing normal
probability plots of these data sets, students learn to associate types of deviations from a straight-line pattern with the
shapes of the underlying distributions.

Selecting an item from the **Distributions** pulldown menu initiates the creation of a normal probability plot of a
random sample of 30 data points drawn from one of three distributions: N(0,10), Chi-squared (3 d.f.) and Student's t (3 d.f.).
A plot of the density of the selected distribution overlaid with a dot plot of the actual sample data drawn is displayed in
the first simulation window. New samples can be drawn by pushing the **New Sample** button repeatedly or by holding it
down using the mouse. Figure 14 shows the examples of plots for samples drawn from
each of the above 3 distributions.

Normal | Chi Square | t Distribution |

Figure 14:

In the *transformation mode*, the **regteach5** module allows the user to perform data transformations on a
selected data set. After choosing a data set by pressing **Select Data Set** button, the user can select a variance
stabilizing scheme from a pulldown menu by pressing the **Var Stbl Trans** button. By observing the changes made
dynamically in the *residual plot* and the *normal probability plot* in response to each transformation
attempted, the user will be able to select a transformation that adequately stabilizes the variance and brings the data
closer to normality. Also the **Power** and **Shift** slider-bars allow the user to try Box-Cox transformations of
the original data by choosing various power and shift values. The residual plot and the normal probability plot can be
saved for comparison among satisfactory transformations by pushing on the **Save Plot** button.
Figure 15 and Figure 16 show an example of
each of the above plots saved for such a comparison.

In Figure 15, the residual plot indicates a dependence of the residual variance on
the magnitude of the response, and the probability plot indicates some deviation from normality. The *square root*
transformation appears to restore normality and stabilize the variance to some extent, as evinced from the bottom set of
frames.

Figure 15

Figure 15: Saved Plot Windows: Variance Stabilizing Transformation of the Response.

In Figure 16, the residual plot shows both curvature and nonconstant variance, although the residuals do not exhibit significant deviation from normality. As shown in the bottom set of frames, a power transformation of = 0.3 appears to stabilize the variance while retaining normality.

Figure 16

Figure 16: Saved Plot Windows: Box-Cox Power Transformation of the Response.

The instructional modules and Lisp-Stat source code for the software components of our modules are available via anonymous
ftp from Iowa State University. To obtain these, use the command **ftp isua.iastate.edu** with "anonymous.stat" as the
username and "yourusername@your.email.host" as the password. This should get you into the statistics directory named
"anonymous". The subdirectories "Teach", "RegTeach", and "DsnTeach" contain Readme files describing three sets of software.
If you are accessing files from a Unix host, ftp the compressed archive "regteach.tar.gz" to obtain a version that can be
installed on the Unix platform. Otherwise obtain the file appropriate for your platform (e.g., regteach.exe for PC/Windows
and regteach.sea.hqx for older Macs). These files are binary archives that unbundle when executed. Other Readme/Install
files will be found in each package after unbundling.

Pdf formatted files of the current version of this paper and instructional module lessons designed for use with the software modules are available in the subdirectory /Docs. The subdirectory /Lessons and /Figures contain the original latex files and corresponding figures used in the lesson documents.

Lisp-Stat is freely available from **umnstat.stat.umn.edu** or from **statlib**. It is recommended that Mac OS X
users adapt the Unix versions of the software; instructions to do this are available in a help file in the shell archive.
Our software currently runs under Version 2.1 Release 3.52 of Lisp-Stat. We plan to make this software available from other
servers on the internet such as statlib and the UCLA statistics archive.

In this article we have extended the collection of instructional modules described in Marasinghe et al. (1996). As with the modules described in Iversen and Marasinghe (2001), the ones described here are specific to regression analysis and are much more advanced, elaborate, and complex than the ones describing elementary statistical concepts. One tenet we have attempted to follow in developing these modules is to obtain ideas and feedback from those instructors involved in teaching these topics. We hope that the research presented here will lead to exploration, refinement, and dissemination of other such modules for teaching statistics interactively. We welcome comments from potential users of these modules.

We are grateful to the statistics instructors who used the earlier sets of modules in their teaching and sent us comments and words of encouragement. The current set of modules was developed for use in courses where the primary audience is undergraduate and graduate students from disciplines other than statistics. We hope that they will enable these students to obtain an improved understanding of the underlying statistical concepts.

Anderson, J. E., and Dayton, J. D. (1995), "Instructional Regression
Modules using XLISP-STAT," *Journal of Statistics Education* [On line], **3**(1).
jse.amstat.org/v3n1/anderson.html

Becker, R. A., Cleveland, W., and Wilks, A. (1987), "Dynamic Graphics
for Data Analysis," *Statistical Science*, **4**, 355-395.

Cleveland, W. S., and McGill, R. (eds.) (1988), *Dynamics Graphics*,
New York: Chapman and Hall.

Cook, R. D., and Weisberg, S. (1989), "Regression Diagnostics with
Dynamic Graphics (with discussion)," *Technometrics*, **31**,
277-311.

Cook, R. D., and Weisberg, S. (1994), *An Introduction to Regression
Graphics*, New York: Wiley.

Cook, R. D., and Weisberg, S. (1999), *Applied Regression Including
Computing Graphics*, New York: Wiley.

Darius, P., Michiels, S., Raeymaekers, B., Ottoy, J-P., and Thas,
O. (2002), "Applets for Experimenting wih Statistical Concepts," in
*Proceedings of the Sixth International Conference on Teaching
Statistics*, Cape Town, South Africa.

de Leeuw, J. (1994), "The Lisp-Stat Statistical Environment," *
Statistical Computing and Graphics Newsletter*, **5**(3), 13-17.

Iversen, P., and Marasinghe, M. G. (2001), "Dynamic
Graphical Tools for Teaching Experimental Design and Analysis
Concepts," *The American Statistician*, **55**(4), 345-351.

Marasinghe, M. G., Meeker, W. Q., Cook, D., and Shin, T. (1996), "Using
Graphics and Simulation to Teach Statistical Concepts," *The
American Statistician*, **50**(4), 342-351.

Nurhonen, M., and Puntanen, S. (1992), "Illustrating Regression
Concepts," *Teaching Statistics*, **14**(1), 20-23.

Saunders, D. J. (1986), "Computer graphics and animations for teaching
probability and statistics," *International Journal of
Mathematical Education in Science and Technology*, **17**, 561-568.

Tierney, L. (1991), *Lisp-Stat*, New York: Wiley.

Trumbo, B. E. (1994), "Some Demonstration Programs for Use in Teaching
Elementary Probability: Part 1 and 2," *Journal of Statistics
Education* [On line], **2**(2).
jse.amstat.org/v2n2/trumbo.html

Velleman, P. (2004), *ActivStats 2003-2004 Release*, Addison-Wesley.

Mervyn Marasinghe

Department of Statistics

Iowa State University

Ames, IA 50011-1210

USA
*mervyn@iastate.edu*

William M. Duckworth

Department of Statistics

Iowa State University

Ames, IA 50011-1210

USA
*wmd@iastate.edu*

Tae-Sung Shin

StatSoft Inc.

2300 E. 14th St.

Tulsa, OK 74104
*sts@statsoft.com*

Volume 12 (2004) | Archive | Index | Data Archive | Information Service | Editorial Board | Guidelines for Authors | Guidelines for Data Contributors | Home Page | Contact JSE | ASA Publications