Dexter C. Whittinghill
Rowan University
Robert V. Hogg
University of Iowa
Journal of Statistics Education Volume 9, Number 2 (2001)
Copyright © 2001 by Dexter C. Whittinghill and
Robert V. Hogg, all rights reserved.
This text may be freely shared among individuals, but it
may not be republished in any medium without express
written consent from the authors and advance notification
of the editor.
Key Words: Confidence intervals; Efficiency; Estimation; Maximum likelihood; Sufficiency; Tests of hypotheses.
We explore the varied uses of the uniform distribution on as an example in the undergraduate probability and statistics sequence or the mathematical statistics course. Like its cousin, the uniform distribution on , this density provides tractable examples from the topic of order statistics to hypothesis tests. Unlike its cousin, which appears in many probability and statistics books, this uniform is less well known or used. We discuss maximum likelihood estimators, likelihood ratio tests, confidence intervals, joint distributions of order statistics, use of Mathematica®, sufficiency, and other advanced topics. Finally, we suggest a few exercises deriving likelihood ratio tests when the range is unknown as well, or for the uniform on .
Anyone who has taught undergraduate probability and statistics knows that it is sometimes difficult to find good examples of families of density functions that can be used in the classroom or for challenging homework for the better students. The 'common' families of one-parameter probability functions (such as the geometric or exponential) are almost surely used as an example or in one of the textbook problems. When trying to 'cook up' an example, often the instructor finds that the author of the book has already used the recipe! For illustration, we often think of a uniform density on , but it is in nearly every probability and statistics text. When instructors get more creative ('the author will never have thought of this!'), they find that their wonderful mass or density function has a maximum likelihood estimator with a nearly intractable distribution. Exasperation sets in soon thereafter.
Here is a simple distribution that appears in two
textbooks by the second author. It is also in Miller and
Miller (1999) and Mood, Graybill, and Boes (1974), but not
in many other texts. It is the uniform distribution on
,
with density
I[a,b](x) is an
indicator function that is 1 when the argument is in the
interval, and 0 when it is not. Other texts sometimes use
the uniform on other fixed ranges, such as
,
but this is essentially the same function. Often more
interesting problems can be created when the uniform
distribution is on
,
with both
and
unknown, and we will discuss these briefly. The density (1) is a location-parameter alternative to
the uniform on
,
and it provides a rich assortment of material for
discussions or examples. There are many unbiased and
consistent estimators of
to compare, including the familiar
as the method of moments estimator. The rather simple
likelihood function yields an interesting uncountable
set of maximum likelihood estimators. The density
also provides fairly simple distributions for the order
statistics. Using a modified likelihood ratio test,
sensible and intuitive tests of hypotheses and confidence
intervals can easily be derived. Finally, for more
advanced students, the sufficiency results are not
straightforward (we have joint sufficiency), and
completeness and minimal sufficiency can also be
discussed. When teaching probability and statistics, often the
greatest need for help is in finding examples when the
course reaches 'parametric point estimation.' That is, we
want to estimate the value of the unknown parameter or
parameters in a family of distributions, and to make
statistical inferences we need to know the distributions of
the estimators. Before presenting the flexibility of the
uniform distribution on
,
we begin with a discussion of how such a model might arise. Consider the situation where you are waiting to catch a
bus. You want to estimate ,
the average number of minutes that the bus takes to travel
from the previous stop to your stop (you can see that other
stop up the road). Each day you can take an observation on
how long the bus takes and use the observations to estimate
.
Let us assume that under 'usual conditions' the variation
induced by the traffic causes the bus to take between
-minus and -plus a fixed unit of time, say
one-half a minute for convenience, to arrive at your stop.
Moreover, it is reasonable to assume that the distribution
is uniform. If we let the random variable X be the
number of minutes for the trip, then X has a
probability density function (density) given by
(1). We note two things. First, the interval can be open or
closed, and we will use a closed interval to avoid
discussion of supremum versus maximum. Second, it is an
easy homework problem for students to show that the
mean of the distribution is ,
the variance is 1/12, and the cumulative distribution
function is The joint density for a random sample of n
observations is the deceivingly simple where
are the order statistics. If an instructor does not stress
the derivation of distributions of order statistics (we
recommend covering it), the distributions of the minimum
Y1 and maximum
Yn, along with their joint
distributions, can be given to the students. The students
can also simply work with data. (See Hogg and Tanis
1997, problems 6.1.15 and 10.1.11.)
In general, we present our students with different ways
of finding estimators in probability and statistics.
Because
is the mean of the distribution for X, our
students usually suggest
as an estimator (which we
will call W1). This intuitive result is the method of
moments estimator of
and leads nicely into a
discussion of that topic.
is also the
least-squares estimator of .
(See Hogg and Tanis 1997, problem 6.1.15.) If we are lucky, our students will suggest other
intuitive estimators, such as the median (which we will
call W3) or the midrange, which is the
average of the minimum and maximum order statistics, say
The likelihood function for a random variable is the joint
density of the random sample, but we consider it as a
function of
for a given sample, not as a
function of the sample for a given .
We seek maximum likelihood estimators (MLEs) because they
can possess useful properties. For instance, if
is an MLE for ,
and h is a function with a single-valued inverse,
then
is the MLE of
.
(See Casella and Berger 1990, Theorem 7.2.1, for a more
advanced invariance property, where h is any
function, not necessarily with a single-valued inverse.)
Also, under certain regularity conditions, the sequence of
MLEs (one for each n) is best asymptotically normal,
or BAN. Textbooks are full of examples for which these
results apply, but they sometimes lack examples that fail
to satisfy regularity conditions. One of the 'charms' of
our example is that it does not meet the regularity
conditions needed for many results.
The likelihood function of a random sample from a distribution with density (1) is Unlike many other situations, here the MLE of
is not unique. A careful look at the likelihood function
shows that any statistic
satisfying is an MLE of .
This includes the midrange, W2.
By way of examples, the student should demonstrate that
and W3 (the median) are not
maximum likelihood estimators of
even though they are unbiased. Of course every estimator
of the form
,
with
and
,
Which of our three unbiased estimators,
W1, W2, or
W3, is best? Are there better
estimators? If class discussion has not already led to the
notion that we are searching for unbiased estimators with
small variance, then the idea of minimum variance unbiased
estimators or efficiency can be raised here. When
comparing W1, W2, and
W3, we must calculate the mean and
variance of each of the three estimators. The case of
is straightforward, as it is a consequence of standard
formulas like
. Each of the midrange (W2) and the
median (W3) requires knowledge of
distributions of order statistics. Depending on the level
of the students or the book, the calculations for the
cases involving order statistics can be assigned or not
(with the results being given to the students) in various
combinations. For the better students these are
challenging homework problems, and we note some of the
difficulties. Because where F and f are the respective
distribution and density function of the uniform
distribution on [0,1]. Using It is easy to show that
and
,
so
and
.
Thus making W2 unbiased. Also it is an
easy exercise to find the variances, namely To find the Var[W2], we need Hence the joint density for Z1 and
Zn is after some straightforward integration. Thus Finally, the variance of
Remarks. The transformation used employed the
function
,
which is a pivotal quantity, because it made the
distribution of Zi independent of
,
whatever
was. Pivotal quantities, generally discussed in the more
rigorous undergraduate texts, are also very helpful in
constructing confidence intervals. (See Casella and Berger
1990 or Mood, Graybill, and Boes 1974 for further
discussion.) Another way of finding Var[W2]
is to first find the distribution of W2
using the 'distribution function technique' (see page 222
of Hogg and Tanis 1997). Again in this discussion, we let
.
For which is that of a beta distribution with
.
Hence
;
so
.
Moreover, If the students have not studied the beta distribution,
a computer algebra system (CAS) such as Mathematica®
or Maple® can be used to establish the results. For n = 2m, the median is
Moreover, the marginal distributions of
Zm and
Zm+1 are, respectively, and These are beta distributions with
,
and
,
,
respectively. Hence and Also, As before, to find Var[W3] we must
evaluate The integration is straightforward, or a CAS can be
used. Thus Finally, algebra yields
In most cases you would not make the students do all of
the work indicated above, and again Mathematica® or
Maple® could be used. However, after they have
suggested the various estimators, and found the mean and
variance of at least two (including ), an instructor could
present all of the variances and have the students compare
them by calculating the relative efficiency of one
estimator to another. Having the students derive or check
some of the efficiency results requires them to work with
inequalities, something they may not have done for quite
some time in their undergraduate careers. Table 1 gives the variances and shows the relative
efficiencies of the estimators
,
the midrange (W2), and the median
(W3). It contains some interesting
results. Of course, for
(1) 2. The Uniform Distribution on
3. Maximum Likelihood and Efficiency
(2) 3.1 The Sample Midrange and the Joint Density for Two Order Statistics
(3)
(4) 3.2 The Sample Median
3.3 Comparison of the Three Estimators of
Table 1. Variances and Relative Efficiencies of the Mean, Median, and Midrange
Estimator: | W1, mean | W2, midrange | W3, median |
Variance: | n even:
n odd: |
||
Efficiency of W1 to above: | n even:
n odd: |
||
Efficiency of W2 to above: | n even:
n odd: |
NOTE: See Hogg and Tanis (1997), problem 10.1.11, for a
comparison of the mean, median, and midrange for
Remark.
If the median is to be used for an odd sample size, don't throw away an observation! Statisticians do not recommend throwing away or ignoring any data. Instead, for a sample of sizeFinally, when discussing the Cramér-Rao Lower Bound, this example can be used as one where the conditions of the theorem are not met: the domain depends on the parameter being estimated. The students can also show that all three estimators are consistent; each is unbiased and its variance converges to zero as n increases. In that regard, it is interesting to note that the variances of the mean and median are of order 1/n, while that of the midrange is 1/n2. That is, for large n, the variance of the midrange is much smaller.
The topic of sufficiency is very important for finding
good estimators, but it can be very difficult for students
to grasp. It is important because sufficient statistics
are associated with good estimators. Unique MLEs are
functions of sufficient statistics, if they exist (see Rice
1995, p. 284, Corollary A). Unbiased estimators that are
functions of sufficient statistics have smaller variances,
and unbiased estimators derived from sufficient statistics
with 'complete' families are uniformly minimum variance
unbiased estimators. The concept that a sufficient
statistic contains all of the information from the sample
necessary for estimating the parameter sounds simple and
intuitive. However, the definition of a sufficient
statistic is very technical and not always written
intuitively. Many authors use
Let us assume that sufficiency has been defined in the course, and that a one-statistic-one-parameter factorization theorem like Theorem 1 (p. 318) of Hogg and Craig (1995) has been presented. The random variable of (1) provides a good second or third example for discussing sufficiency. The likelihood function in (2) is:
(5) |
From this expression, some students will say that you can't rewrite (5) so that there is a factorization, and hence there is no single sufficient statistic u1. Some may in fact come up with the idea that Y1 and Yn are jointly sufficient statistics. Example 1, page 348 of Hogg and Craig (1995), shows that Y1 and Yn are jointly sufficient for . They also point out that these are minimally sufficient and state that there can't be one sufficient statistic. So the density of (1) shows the student that all examples don't work out nicely. The fact that Y1 and Yn are joint sufficient statistics for explains why the variance of the midrange is so much less than for W1 and W3.
Remark. First, Mood, Graybill, and Boes (1974) give
an alternate definition of sufficiency (Definition 16,
The density for (1) is also useful for illustrating other topics. Mood, Graybill, and Boes (1974) show that is a location parameter (example 37, p. 333), and then that W2 is the uniformly smallest mean-squared error estimator in the class of location invariant estimators (example 39, p. 335). Hogg and Craig (1995) essentially show that Y1 and Yn are not complete (example 2, p. 349), and discuss the idea of ancillary statistics (example 5, p. 357). We have noted that the MLE is not unique, as any weighted mean of and will serve as an MLE. However, none of these are sufficient alone; so there is no minimal sufficient statistic for , because to enjoy that property the MLE must be sufficient.
After discussing the properties of good estimators, it
is easy to forget that we wanted to estimate the time the
bus takes to travel between stops. Let us agree that
We illustrate a better way by first finding a good test of against , and then we use the relationship between tests and confidence intervals to find a confidence interval for . The likelihood ratio is given by
where is a maximum likelihood estimator such as W2. Note that provided , but if or . Thus, strictly following the likelihood ratio criterion, we would reject H0 if or . That is, we would accept H0 if . Clearly, if is true, we would never reject H0 with this test, and the significance level is . However, we must be concerned about the power of the test when . This suggests that we could improve the power if we selected a constant c slightly less than and accepted H0 if ; otherwise reject H0.
Let us find c so that the significance level is . Thus we want, when , that
Accordingly,
.
Going 'backwards' from this test, the corresponding
confidence interval for
is
The power of this test is easy for a student to compute by evaluating
Clearly, when , . Of course, is symmetric about and hence we need only consider when . If , then all and . So when ,
We present several exercises when considering the uniform distribution on , where is unknown. Thus, . These are analogous to problems associated with the normal distribution when is unknown, and bright students can often think of the two-sample and k-sample problems on their own (possibly with a little direction from the instructor). In each of the following situations, the student is asked to show that the likelihood ratio equals the form given. Students may find these 'easy' or 'hard' depending on their insight.
Then
where are the order statistics of a random sample of size n.
Using the same notation as in Exercise 6.1,
For convenience, let
Using the same assumptions and notation of Exercise 6.3,
For convenience, let all k independent random samples have the same sample size n, and be the order statistics of the ith sample. Then
Using the same assumptions and notation of Exercise 6.5,
To determine the significance level of each of the above tests that rejects H0 if , we need to determine the distribution of or some appropriate function of . These are much more difficult exercises.
First we must thank the referees who made great suggestions, and one of whom caught a major error in one of our statements. We certainly appreciate their time and effort. The estimator in the Remark in Section 3.3 is theirs. We also thank the other authors whom we have referenced and who have dabbled with the interesting, one-parameter example. Their work has contributed to this paper.
Although the first author was dismayed when he found out that 'his' wonderful example was already in some of the literature (some published over 40 years ago; see Hogg and Craig (1956) and the 1959 edition of their book), he realized that he was not the only person to think this example was a good one. He then contacted the second author about presenting many of the applications of these uniform distributions in one document. When the first author later found the example in Mood, Graybill, and Boes (1974), a book he had used at least eleven years prior to his 'creating' the example, he almost thought that he had subconsciously plagiarized! However, with the continued rise in importance of statistics education and the fact that no single textbook has used this uniform distribution to any extent, the authors felt the project was still worthwhile.
The authors have one request: readers who find uniform distributions of the type considered here, referenced in any book or article, should contact the first author with the references. This is especially true of those who have written the book or article themselves! We have looked at many texts, but certainly not all of them.
Casella, G., and Berger, R. L. (1990), Statistical Inference, Belmont, CA: Duxbury.
Hogg, R. V., and Craig, A. T. (1956), "Sufficient Statistics In Elementary Distribution Theory," Sankhya, 17, 209.
----- (1995), Introduction to Mathematical Statistics (5th ed.), Englewood Cliffs, NJ: Prentice Hall.
Hogg, R. V., and Tanis, E. A. (1997), Probability and Statistical Inference (5th ed.), Englewood Cliffs, NJ: Prentice Hall.
Miller, I., and Miller, M. (1999), John E. Freund's Mathematical Statistics (6th ed.), Upper Saddle River, NJ: Prentice Hall.
Mood, A. M., Graybill, F. A., and Boes, D. C. (1974), Introduction to the Theory of Statistics (3rd ed.), NY: McGraw-Hill.
Rice, J. A. (1995), Mathematical Statistics and Data Analysis (2nd ed.), Belmont, CA: Duxbury.
Dexter C. Whittinghill
Department of Mathematics
Rowan University
201 Mullica Hill Rd.
Glassboro, NJ 08028
Robert V. Hogg
Department of Statistics and Actuarial Science
University of Iowa
Iowa City, IA 52242
Volume 9 (2001) | Archive | Index | Data Archive | Information Service | Editorial Board | Guidelines for Authors | Guidelines for Data Contributors | Home Page | Contact JSE | ASA Publications