![]() |
Dexter C. Whittinghill
Rowan University
Robert V. Hogg
University of Iowa
Journal of Statistics Education Volume 9, Number 2 (2001)
Copyright © 2001 by Dexter C. Whittinghill and
Robert V. Hogg, all rights reserved.
This text may be freely shared among individuals, but it
may not be republished in any medium without express
written consent from the authors and advance notification
of the editor.
Key Words: Confidence intervals; Efficiency; Estimation; Maximum likelihood; Sufficiency; Tests of hypotheses.
We explore the varied uses of the uniform distribution on
as an example in the undergraduate probability and
statistics sequence or the mathematical statistics course.
Like its cousin, the uniform distribution on
,
this density provides tractable examples from the topic of
order statistics to hypothesis tests. Unlike its cousin,
which appears in many probability and statistics books,
this uniform is less well known or used. We discuss
maximum likelihood estimators, likelihood ratio tests,
confidence intervals, joint distributions of order
statistics, use of Mathematica®, sufficiency, and
other advanced topics. Finally, we suggest a few exercises
deriving likelihood ratio tests when the range is unknown
as well, or for the uniform on
.
Anyone who has taught undergraduate probability and
statistics knows that it is sometimes difficult to find
good examples of families of density functions that can be
used in the classroom or for challenging homework for the
better students. The 'common' families of one-parameter
probability functions (such as the geometric or
exponential) are almost surely used as an example or in one
of the textbook problems. When trying to 'cook up' an
example, often the instructor finds that the author of the
book has already used the recipe! For illustration, we
often think of a uniform density on
,
but it is in nearly every probability and statistics text.
When instructors get more creative ('the author will never
have thought of this!'), they find that their wonderful
mass or density function has a maximum likelihood estimator
with a nearly intractable distribution. Exasperation sets
in soon thereafter.
Here is a simple distribution that appears in two
textbooks by the second author. It is also in Miller and
Miller (1999) and Mood, Graybill, and Boes (1974), but not
in many other texts. It is the uniform distribution on
I[a,b](x) is an
indicator function that is 1 when the argument is in the
interval, and 0 when it is not. Other texts sometimes use
the uniform on other fixed ranges, such as
The density (1) is a location-parameter alternative to
the uniform on
When teaching probability and statistics, often the
greatest need for help is in finding examples when the
course reaches 'parametric point estimation.' That is, we
want to estimate the value of the unknown parameter or
parameters in a family of distributions, and to make
statistical inferences we need to know the distributions of
the estimators. Before presenting the flexibility of the
uniform distribution on
Consider the situation where you are waiting to catch a
bus. You want to estimate We note two things. First, the interval can be open or
closed, and we will use a closed interval to avoid
discussion of supremum versus maximum. Second, it is an
easy homework problem for students to show that the
mean of the distribution is The joint density for a random sample of n
observations is the deceivingly simple where
In general, we present our students with different ways
of finding estimators in probability and statistics.
Because
If we are lucky, our students will suggest other
intuitive estimators, such as the median (which we will
call W3) or the midrange, which is the
average of the minimum and maximum order statistics, say
The likelihood function for a random variable is the joint
density of the random sample, but we consider it as a
function of
The likelihood function of a random sample from a distribution with density (1) is Unlike many other situations, here the MLE of is an MLE of Which of our three unbiased estimators,
W1, W2, or
W3, is best? Are there better
estimators? If class discussion has not already led to the
notion that we are searching for unbiased estimators with
small variance, then the idea of minimum variance unbiased
estimators or efficiency can be raised here. When
comparing W1, W2, and
W3, we must calculate the mean and
variance of each of the three estimators. The case of
Each of the midrange (W2) and the
median (W3) requires knowledge of
distributions of order statistics. Depending on the level
of the students or the book, the calculations for the
cases involving order statistics can be assigned or not
(with the results being given to the students) in various
combinations. For the better students these are
challenging homework problems, and we note some of the
difficulties. Because where F and f are the respective
distribution and density function of the uniform
distribution on [0,1]. Using It is easy to show that
making W2 unbiased. Also it is an
easy exercise to find the variances, namely To find the Var[W2], we need Hence the joint density for Z1 and
Zn is after some straightforward integration. Thus Finally, the variance of
Remarks. The transformation used employed the
function
Again in this discussion, we let
which is that of a beta distribution with
If the students have not studied the beta distribution,
a computer algebra system (CAS) such as Mathematica®
or Maple® can be used to establish the results. For n = 2m, the median is
Moreover, the marginal distributions of
Zm and
Zm+1 are, respectively, and These are beta distributions with
and Also, As before, to find Var[W3] we must
evaluate The integration is straightforward, or a CAS can be
used. Thus Finally, algebra yields
In most cases you would not make the students do all of
the work indicated above, and again Mathematica® or
Maple® could be used. However, after they have
suggested the various estimators, and found the mean and
variance of at least two (including Table 1 gives the variances and shows the relative
efficiencies of the estimators
,
with density
(1) ,
but this is essentially the same function. Often more
interesting problems can be created when the uniform
distribution is on
,
with both
and
unknown, and we will discuss these briefly.
,
and it provides a rich assortment of material for
discussions or examples. There are many unbiased and
consistent estimators of
to compare, including the familiar
as the method of moments estimator. The rather simple
likelihood function yields an interesting uncountable
set of maximum likelihood estimators. The density
also provides fairly simple distributions for the order
statistics. Using a modified likelihood ratio test,
sensible and intuitive tests of hypotheses and confidence
intervals can easily be derived. Finally, for more
advanced students, the sufficiency results are not
straightforward (we have joint sufficiency), and
completeness and minimal sufficiency can also be
discussed.
2. The Uniform Distribution on
,
we begin with a discussion of how such a model might arise.
,
the average number of minutes that the bus takes to travel
from the previous stop to your stop (you can see that other
stop up the road). Each day you can take an observation on
how long the bus takes and use the observations to estimate
.
Let us assume that under 'usual conditions' the variation
induced by the traffic causes the bus to take between
-minus and
-plus a fixed unit of time, say
one-half a minute for convenience, to arrive at your stop.
Moreover, it is reasonable to assume that the distribution
is uniform. If we let the random variable X be the
number of minutes for the trip, then X has a
probability density function (density) given by
(1).
,
the variance is 1/12, and the cumulative distribution
function is
are the order statistics. If an instructor does not stress
the derivation of distributions of order statistics (we
recommend covering it), the distributions of the minimum
Y1 and maximum
Yn, along with their joint
distributions, can be given to the students. The students
can also simply work with data. (See Hogg and Tanis
1997, problems 6.1.15 and 10.1.11.)
is the mean of the distribution for X, our
students usually suggest
as an estimator (which we
will call W1). This intuitive result is the method of
moments estimator of
and leads nicely into a
discussion of that topic.
is also the
least-squares estimator of
.
(See Hogg and Tanis 1997, problem 6.1.15.)
.
All three of them are unbiased for
.
Comparison of the variances leads to the ideas of minimum
variance and efficiency. At this point, the instructor can
go in many different directions, but, because of the
desirable properties of the maximum likelihood estimators,
these are often discussed next. (See Miller and Miller
1999, problems 10.3 and 10.30.)
3. Maximum Likelihood and Efficiency
for a given sample, not as a
function of the sample for a given
.
We seek maximum likelihood estimators (MLEs) because they
can possess useful properties. For instance, if
is an MLE for
,
and h is a function with a single-valued inverse,
then
is the MLE of
.
(See Casella and Berger 1990, Theorem 7.2.1, for a more
advanced invariance property, where h is any
function, not necessarily with a single-valued inverse.)
Also, under certain regularity conditions, the sequence of
MLEs (one for each n) is best asymptotically normal,
or BAN. Textbooks are full of examples for which these
results apply, but they sometimes lack examples that fail
to satisfy regularity conditions. One of the 'charms' of
our example is that it does not meet the regularity
conditions needed for many results.
(2)
is not unique. A careful look at the likelihood function
shows that any statistic
satisfying
.
This includes the midrange, W2.
By way of examples, the student should demonstrate that
and W3 (the median) are not
maximum likelihood estimators of
even though they are unbiased. Of course every estimator
of the form
,
with
and
,
.
(See Hogg and Craig 1995, problem
6.3; Hogg and Tanis 1997, problems 6.1.15 and 10.1.11;
Miller and Miller 1999, problems 10.75 and 76; and Mood,
Graybill, and Boes 1974, example VII.7.)
is straightforward, as it is a consequence of standard
formulas like
.
3.1 The Sample Midrange and the Joint Density for Two Order Statistics
,
are order statistics and have the distributions of the
respective order statistics for a random sample from the
uniform distribution on [0,1]. Of course the means change,
but those of Yi are easily found
from those of Zi, and the
variances and covariances are the same. The density for
the kth order statistic is given
on page 198 of Hogg and Craig (1995) as
(3)
and
,
so
and
.
Thus
(4)
,
which is a pivotal quantity, because it made the
distribution of Zi independent of
,
whatever
was. Pivotal quantities, generally discussed in the more
rigorous undergraduate texts, are also very helpful in
constructing confidence intervals. (See Casella and Berger
1990 or Mood, Graybill, and Boes 1974 for further
discussion.) Another way of finding Var[W2]
is to first find the distribution of W2
using the 'distribution function technique' (see page 222
of Hogg and Tanis 1997).
3.2 The Sample Median
.
For
.
Hence
;
so
.
Moreover,
,
and
,
,
respectively. Hence
), an instructor could
present all of the variances and have the students compare
them by calculating the relative efficiency of one
estimator to another. Having the students derive or check
some of the efficiency results requires them to work with
inequalities, something they may not have done for quite
some time in their undergraduate careers.
3.3 Comparison of the Three Estimators of
,
the midrange (W2), and the median
(W3). It contains some interesting
results. Of course, for
,
the midrange is more efficient than the mean, which is more
efficient than the median. That is, the average of the two
extreme values is better than the average of all of the
values, which is better than the median for estimating the
center of this fixed-width density. At first this might
seem amazing, but the midrange is based on two order
statistics, each of which has a smaller variance than any
Yi. It should also be noted that
for the sample median, the case of
Table 1. Variances and Relative Efficiencies of the Mean, Median, and Midrange
Estimator: | W1, mean | W2, midrange | W3, median |
Variance: |
![]() |
![]() |
n even:
![]() n odd: ![]() |
Efficiency of W1 to above: |
![]() |
n even:
![]() n odd: ![]() |
|
Efficiency of W2 to above: | n even:
![]() n odd: ![]() |
NOTE: See Hogg and Tanis (1997), problem 10.1.11, for a
comparison of the mean, median, and midrange for
Remark.
If the median is to be used for an odd sample size, don't throw away an observation! Statisticians do not recommend throwing away or ignoring any data. Instead, for a sample of sizeFinally, when discussing the Cramér-Rao Lower Bound, this example can be used as one where the conditions of the theorem are not met: the domain depends on the parameter being estimated. The students can also show that all three estimators are consistent; each is unbiased and its variance converges to zero as n increases. In that regard, it is interesting to note that the variances of the mean and median are of order 1/n, while that of the midrange is 1/n2. That is, for large n, the variance of the midrange is much smaller.
The topic of sufficiency is very important for finding
good estimators, but it can be very difficult for students
to grasp. It is important because sufficient statistics
are associated with good estimators. Unique MLEs are
functions of sufficient statistics, if they exist (see Rice
1995, p. 284, Corollary A). Unbiased estimators that are
functions of sufficient statistics have smaller variances,
and unbiased estimators derived from sufficient statistics
with 'complete' families are uniformly minimum variance
unbiased estimators. The concept that a sufficient
statistic contains all of the information from the sample
necessary for estimating the parameter sounds simple and
intuitive. However, the definition of a sufficient
statistic is very technical and not always written
intuitively. Many authors use
if and only if the conditional distribution of
for any value w of W." (See Miller and Miller
1999, Definition 10.3, and Mood, Graybill, and Boes 1974,
Definition 15,
from the sample. Yet because sufficiency is so important,
authors go to great lengths, committing much text space
with description and examples, in order to explain the
concept. To help apply the concept of sufficiency to
finding good estimators, authors present one or two
factorization theorems. Even then, for a student new to
the material, it can be difficult seeing that a
factorization theorem follows intuitively from the
concept. In Hogg and Craig (1995), over two pages of
discussion precedes the definition, and then the definition
is presented in a functional form like a factorization
theorem. Hogg and Tanis (1997) wait until the end of the
book to discuss the topic, and use the factorization
theorem as the definition.
Let us assume that sufficiency has been defined in the course, and that a one-statistic-one-parameter factorization theorem like Theorem 1 (p. 318) of Hogg and Craig (1995) has been presented. The random variable of (1) provides a good second or third example for discussing sufficiency. The likelihood function in (2) is:
![]() |
(5) |
From this expression, some students will say that you
can't rewrite (5) so that there is
a
factorization, and hence there is no single sufficient
statistic u1. Some may in fact come up
with the idea that Y1 and
Yn are jointly sufficient
statistics. Example 1, page 348 of Hogg and Craig (1995),
shows that Y1 and
Yn are jointly sufficient for
.
They also point out that these are minimally sufficient and
state that there can't be one sufficient statistic. So the
density of (1) shows the student that all examples don't
work out nicely. The fact that Y1 and
Yn are joint sufficient statistics
for
explains why the variance of the midrange is so much less
than for W1 and W3.
Remark. First, Mood, Graybill, and Boes (1974) give
an alternate definition of sufficiency (Definition 16,
if and only if the conditional distribution of T
given
for any statistic
.
The density for (1) is also
useful for illustrating other topics. Mood, Graybill, and Boes (1974) show that
is a location parameter (example 37, p. 333), and then
that W2 is the uniformly smallest
mean-squared error estimator in the class of location
invariant estimators (example 39, p. 335). Hogg and
Craig (1995) essentially show that Y1 and
Yn are not complete (example 2,
p. 349), and discuss the idea of ancillary statistics
(example 5, p. 357). We have noted that the MLE is not
unique, as any weighted mean of
and
will serve as an MLE. However, none of these are
sufficient alone; so there is no minimal sufficient
statistic for
,
because to enjoy that property the MLE must be sufficient.
After discussing the properties of good estimators, it
is easy to forget that we wanted to estimate the time the
bus takes to travel between stops. Let us agree that
.
It is unbiased, a maximum likelihood estimator, a function
of sufficient statistics (if not sufficient itself), and
has as small a variance as we have seen. Therefore we
proceed being confident that we are doing a good job of
estimation, if not 'the best' job. This suggests that
W2 ± c,
where c is a constant such that
,
might be the best
confidence interval for
.
However, this is not the case, although a good student
might find it an interesting exercise to find that constant
c by determining the distribution of
W2.
We illustrate a better way by first finding a good test
of
against
,
and then we use the relationship between tests and
confidence intervals to find a confidence interval for
.
The likelihood ratio is given by
where
is a maximum likelihood estimator
such as W2. Note that
provided
,
but
if
or
.
Thus, strictly following the likelihood ratio criterion, we
would reject H0 if
or
.
That is, we would accept H0 if
.
Clearly, if
is true, we would never reject H0
with this test, and the significance level is
.
However, we must be concerned about the power of the test
when
.
This suggests that we could improve the power if we
selected a constant c slightly less than
and accepted H0 if
;
otherwise reject H0.
Let us find c so that the significance level is
.
Thus we want, when
,
that
Accordingly,
.
Going 'backwards' from this test, the corresponding
confidence interval for
is
.
(In problem 4 of their Chapter VIII, Mood, Graybill, and Boes (1974) ask the student to verify that
(Y1, Yn) is a
confidence interval, and to find the confidence
coefficient. In problem 21 of Chapter VIII, they give the
student the more open-ended 'find a good confidence
interval' for
.)
The power of this test is easy for a student to compute by evaluating
Clearly, when
,
.
Of course,
is symmetric about
and hence we need only consider when
.
If
,
then all
and
.
So when
,
We present several exercises when considering the
uniform distribution on
,
where
is unknown. Thus,
.
These are analogous to problems associated with the normal
distribution when
is unknown, and bright students can often think of the
two-sample and k-sample problems on their own
(possibly with a little direction from the instructor). In
each of the following situations, the student is asked to
show that the likelihood ratio
equals the form given. Students may find these
'easy' or 'hard' depending on their insight.
Then
where
are the order statistics of a random sample of size n.
Using the same notation as in Exercise 6.1,
For convenience, let
and
be the order statistics of the independent random samples. Then
Using the same assumptions and notation of Exercise 6.3,
For convenience, let all k independent random
samples have the same sample size n, and
be the order statistics of the ith
sample. Then
Using the same assumptions and notation of Exercise 6.5,
To determine the significance level of each of the
above tests that rejects H0 if
,
we need to determine the distribution of
or some appropriate function of
.
These are much more difficult exercises.
First we must thank the referees who made great suggestions, and one of whom caught a major error in one of our statements. We certainly appreciate their time and effort. The estimator in the Remark in Section 3.3 is theirs. We also thank the other authors whom we have referenced and who have dabbled with the interesting, one-parameter example. Their work has contributed to this paper.
Although the first author was dismayed when he found out that 'his' wonderful example was already in some of the literature (some published over 40 years ago; see Hogg and Craig (1956) and the 1959 edition of their book), he realized that he was not the only person to think this example was a good one. He then contacted the second author about presenting many of the applications of these uniform distributions in one document. When the first author later found the example in Mood, Graybill, and Boes (1974), a book he had used at least eleven years prior to his 'creating' the example, he almost thought that he had subconsciously plagiarized! However, with the continued rise in importance of statistics education and the fact that no single textbook has used this uniform distribution to any extent, the authors felt the project was still worthwhile.
The authors have one request: readers who find uniform distributions of the type considered here, referenced in any book or article, should contact the first author with the references. This is especially true of those who have written the book or article themselves! We have looked at many texts, but certainly not all of them.
Casella, G., and Berger, R. L. (1990), Statistical Inference, Belmont, CA: Duxbury.
Hogg, R. V., and Craig, A. T. (1956), "Sufficient Statistics In Elementary Distribution Theory," Sankhya, 17, 209.
----- (1995), Introduction to Mathematical Statistics (5th ed.), Englewood Cliffs, NJ: Prentice Hall.
Hogg, R. V., and Tanis, E. A. (1997), Probability and Statistical Inference (5th ed.), Englewood Cliffs, NJ: Prentice Hall.
Miller, I., and Miller, M. (1999), John E. Freund's Mathematical Statistics (6th ed.), Upper Saddle River, NJ: Prentice Hall.
Mood, A. M., Graybill, F. A., and Boes, D. C. (1974), Introduction to the Theory of Statistics (3rd ed.), NY: McGraw-Hill.
Rice, J. A. (1995), Mathematical Statistics and Data Analysis (2nd ed.), Belmont, CA: Duxbury.
Dexter C. Whittinghill
Department of Mathematics
Rowan University
201 Mullica Hill Rd.
Glassboro, NJ 08028
Robert V. Hogg
Department of Statistics and Actuarial Science
University of Iowa
Iowa City, IA 52242
Volume 9 (2001) | Archive | Index | Data Archive | Information Service | Editorial Board | Guidelines for Authors | Guidelines for Data Contributors | Home Page | Contact JSE | ASA Publications