# A Little Uniform Density With Big Instructional Potential

Dexter C. Whittinghill
Rowan University

Robert V. Hogg
University of Iowa

Journal of Statistics Education Volume 9, Number 2 (2001)

This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.

Key Words: Confidence intervals; Efficiency; Estimation; Maximum likelihood; Sufficiency; Tests of hypotheses.

## Abstract

We explore the varied uses of the uniform distribution on as an example in the undergraduate probability and statistics sequence or the mathematical statistics course. Like its cousin, the uniform distribution on , this density provides tractable examples from the topic of order statistics to hypothesis tests. Unlike its cousin, which appears in many probability and statistics books, this uniform is less well known or used. We discuss maximum likelihood estimators, likelihood ratio tests, confidence intervals, joint distributions of order statistics, use of Mathematica®, sufficiency, and other advanced topics. Finally, we suggest a few exercises deriving likelihood ratio tests when the range is unknown as well, or for the uniform on .

## 1. Introduction

Anyone who has taught undergraduate probability and statistics knows that it is sometimes difficult to find good examples of families of density functions that can be used in the classroom or for challenging homework for the better students. The 'common' families of one-parameter probability functions (such as the geometric or exponential) are almost surely used as an example or in one of the textbook problems. When trying to 'cook up' an example, often the instructor finds that the author of the book has already used the recipe! For illustration, we often think of a uniform density on , but it is in nearly every probability and statistics text. When instructors get more creative ('the author will never have thought of this!'), they find that their wonderful mass or density function has a maximum likelihood estimator with a nearly intractable distribution. Exasperation sets in soon thereafter.

Here is a simple distribution that appears in two textbooks by the second author. It is also in Miller and Miller (1999) and Mood, Graybill, and Boes (1974), but not in many other texts. It is the uniform distribution on , with density

 (1)

I[a,b](x) is an indicator function that is 1 when the argument is in the interval, and 0 when it is not. Other texts sometimes use the uniform on other fixed ranges, such as , but this is essentially the same function. Often more interesting problems can be created when the uniform distribution is on , with both and unknown, and we will discuss these briefly.

The density (1) is a location-parameter alternative to the uniform on , and it provides a rich assortment of material for discussions or examples. There are many unbiased and consistent estimators of to compare, including the familiar as the method of moments estimator. The rather simple likelihood function yields an interesting uncountable set of maximum likelihood estimators. The density also provides fairly simple distributions for the order statistics. Using a modified likelihood ratio test, sensible and intuitive tests of hypotheses and confidence intervals can easily be derived. Finally, for more advanced students, the sufficiency results are not straightforward (we have joint sufficiency), and completeness and minimal sufficiency can also be discussed.

## 2. The Uniform Distribution on

When teaching probability and statistics, often the greatest need for help is in finding examples when the course reaches 'parametric point estimation.' That is, we want to estimate the value of the unknown parameter or parameters in a family of distributions, and to make statistical inferences we need to know the distributions of the estimators. Before presenting the flexibility of the uniform distribution on , we begin with a discussion of how such a model might arise.

Consider the situation where you are waiting to catch a bus. You want to estimate , the average number of minutes that the bus takes to travel from the previous stop to your stop (you can see that other stop up the road). Each day you can take an observation on how long the bus takes and use the observations to estimate . Let us assume that under 'usual conditions' the variation induced by the traffic causes the bus to take between -minus and -plus a fixed unit of time, say one-half a minute for convenience, to arrive at your stop. Moreover, it is reasonable to assume that the distribution is uniform. If we let the random variable X be the number of minutes for the trip, then X has a probability density function (density) given by (1).

We note two things. First, the interval can be open or closed, and we will use a closed interval to avoid discussion of supremum versus maximum. Second, it is an easy homework problem for students to show that the mean of the distribution is , the variance is 1/12, and the cumulative distribution function is

The joint density for a random sample of n observations is the deceivingly simple

where are the order statistics. If an instructor does not stress the derivation of distributions of order statistics (we recommend covering it), the distributions of the minimum Y1 and maximum Yn, along with their joint distributions, can be given to the students. The students can also simply work with data. (See Hogg and Tanis 1997, problems 6.1.15 and 10.1.11.)

In general, we present our students with different ways of finding estimators in probability and statistics. Because is the mean of the distribution for X, our students usually suggest as an estimator (which we will call W1). This intuitive result is the method of moments estimator of and leads nicely into a discussion of that topic. is also the least-squares estimator of . (See Hogg and Tanis 1997, problem 6.1.15.)

If we are lucky, our students will suggest other intuitive estimators, such as the median (which we will call W3) or the midrange, which is the average of the minimum and maximum order statistics, say W2 = (Y1+Yn)/2. While discussing desirable properties of estimators, such as unbiasedness with smallest variance, W2 and W3 can be compared with . All three of them are unbiased for . Comparison of the variances leads to the ideas of minimum variance and efficiency. At this point, the instructor can go in many different directions, but, because of the desirable properties of the maximum likelihood estimators, these are often discussed next. (See Miller and Miller 1999, problems 10.3 and 10.30.)

## 3. Maximum Likelihood and Efficiency

The likelihood function for a random variable is the joint density of the random sample, but we consider it as a function of for a given sample, not as a function of the sample for a given . We seek maximum likelihood estimators (MLEs) because they can possess useful properties. For instance, if is an MLE for , and h is a function with a single-valued inverse, then is the MLE of . (See Casella and Berger 1990, Theorem 7.2.1, for a more advanced invariance property, where h is any function, not necessarily with a single-valued inverse.) Also, under certain regularity conditions, the sequence of MLEs (one for each n) is best asymptotically normal, or BAN. Textbooks are full of examples for which these results apply, but they sometimes lack examples that fail to satisfy regularity conditions. One of the 'charms' of our example is that it does not meet the regularity conditions needed for many results.

The likelihood function of a random sample from a distribution with density (1) is

 (2)

Unlike many other situations, here the MLE of is not unique. A careful look at the likelihood function shows that any statistic satisfying

is an MLE of . This includes the midrange, W2. By way of examples, the student should demonstrate that and W3 (the median) are not maximum likelihood estimators of even though they are unbiased. Of course every estimator of the form , with and , i = 1, 2, is an MLE, but it is not unbiased unless . (See Hogg and Craig 1995, problem 6.3; Hogg and Tanis 1997, problems 6.1.15 and 10.1.11; Miller and Miller 1999, problems 10.75 and 76; and Mood, Graybill, and Boes 1974, example VII.7.)

Which of our three unbiased estimators, W1, W2, or W3, is best? Are there better estimators? If class discussion has not already led to the notion that we are searching for unbiased estimators with small variance, then the idea of minimum variance unbiased estimators or efficiency can be raised here. When comparing W1, W2, and W3, we must calculate the mean and variance of each of the three estimators. The case of is straightforward, as it is a consequence of standard formulas like .

Each of the midrange (W2) and the median (W3) requires knowledge of distributions of order statistics. Depending on the level of the students or the book, the calculations for the cases involving order statistics can be assigned or not (with the results being given to the students) in various combinations. For the better students these are challenging homework problems, and we note some of the difficulties.

### 3.1 The Sample Midrange and the Joint Density for Two Order Statistics

Because W2 = (Y1+Yn)/2, we need the joint density for Y1 and Yn, as well as their marginal densities. In this section, we make the change of variables , i = 1, 2, ..., n, to simplify the calculations. The are order statistics and have the distributions of the respective order statistics for a random sample from the uniform distribution on [0,1]. Of course the means change, but those of Yi are easily found from those of Zi, and the variances and covariances are the same. The density for the kth order statistic is given on page 198 of Hogg and Craig (1995) as

 (3)

where F and f are the respective distribution and density function of the uniform distribution on [0,1]. Using k = 1 and k = n, the densities for Z1 and Zn are, respectively,

It is easy to show that and , so and . Thus

making W2 unbiased. Also it is an easy exercise to find the variances, namely

To find the Var[W2], we need Cov [Y1, Yn] = Cov [Z1, Zn], and to find these we need the joint density of Z1 and Zn. From page 199 of Hogg and Craig (1995), the joint density for the order statistics Zi and Zj is

 (4)

Hence the joint density for Z1 and Zn is

Cov [Z1, Zn] = E [Z1Zn] - E [Z1E [Zn], and here

after some straightforward integration. Thus

Finally, the variance of (Y1+Yn)/2 is

Remarks. The transformation used employed the function , which is a pivotal quantity, because it made the distribution of Zi independent of , whatever was. Pivotal quantities, generally discussed in the more rigorous undergraduate texts, are also very helpful in constructing confidence intervals. (See Casella and Berger 1990 or Mood, Graybill, and Boes 1974 for further discussion.) Another way of finding Var[W2] is to first find the distribution of W2 using the 'distribution function technique' (see page 222 of Hogg and Tanis 1997).

### 3.2 The Sample Median

Again in this discussion, we let . For n = 2m +1, the sample median is W3 = Ym+1. Using (3), the density for Zm+1 is

which is that of a beta distribution with . Hence ; so . Moreover,

If the students have not studied the beta distribution, a computer algebra system (CAS) such as Mathematica® or Maple® can be used to establish the results.

For n = 2m, the median is W3 = (Ym+Ym+1)/2, but as before we will work with Zm and Zm+1. From (4) the joint density of Zm and Zm+1 is

Moreover, the marginal distributions of Zm and Zm+1 are, respectively,

and

These are beta distributions with , and , , respectively. Hence

and

Also,

As before, to find Var[W3] we must evaluate

The integration is straightforward, or a CAS can be used. Thus

Finally, algebra yields

In most cases you would not make the students do all of the work indicated above, and again Mathematica® or Maple® could be used. However, after they have suggested the various estimators, and found the mean and variance of at least two (including ), an instructor could present all of the variances and have the students compare them by calculating the relative efficiency of one estimator to another. Having the students derive or check some of the efficiency results requires them to work with inequalities, something they may not have done for quite some time in their undergraduate careers.

### 3.3 Comparison of the Three Estimators of

Table 1 gives the variances and shows the relative efficiencies of the estimators , the midrange (W2), and the median (W3). It contains some interesting results. Of course, for n = 2 the mean, median, and midrange are equivalent. For , the midrange is more efficient than the mean, which is more efficient than the median. That is, the average of the two extreme values is better than the average of all of the values, which is better than the median for estimating the center of this fixed-width density. At first this might seem amazing, but the midrange is based on two order statistics, each of which has a smaller variance than any Yi. It should also be noted that for the sample median, the case of n = 2m is more efficient than for n = 2m +1, which has one more observation. In this case, two order statistics are better than one, even if the two are associated with a smaller sample size!

Table 1. Variances and Relative Efficiencies of the Mean, Median, and Midrange

 Estimator: W1, mean W2, midrange W3, median Variance: n even: n odd: Efficiency of W1 to above: n even: n odd: Efficiency of W2 to above: n even: n odd:

NOTE: See Hogg and Tanis (1997), problem 10.1.11, for a comparison of the mean, median, and midrange for n = 3.

Remark. If the median is to be used for an odd sample size, don't throw away an observation! Statisticians do not recommend throwing away or ignoring any data. Instead, for a sample of size 2m +1, combine the strength of the median for an even sample size and the information from the odd observation. Taking a linear combination of the median for the first 2m observations and the last observation gives yet a better statistic:

Finally, when discussing the Cramér-Rao Lower Bound, this example can be used as one where the conditions of the theorem are not met: the domain depends on the parameter being estimated. The students can also show that all three estimators are consistent; each is unbiased and its variance converges to zero as n increases. In that regard, it is interesting to note that the variances of the mean and median are of order 1/n, while that of the midrange is 1/n2. That is, for large n, the variance of the midrange is much smaller.

## 4. Sufficiency and Other Advanced and Varied Topics

The topic of sufficiency is very important for finding good estimators, but it can be very difficult for students to grasp. It is important because sufficient statistics are associated with good estimators. Unique MLEs are functions of sufficient statistics, if they exist (see Rice 1995, p. 284, Corollary A). Unbiased estimators that are functions of sufficient statistics have smaller variances, and unbiased estimators derived from sufficient statistics with 'complete' families are uniformly minimum variance unbiased estimators. The concept that a sufficient statistic contains all of the information from the sample necessary for estimating the parameter sounds simple and intuitive. However, the definition of a sufficient statistic is very technical and not always written intuitively. Many authors use "W(X1, ..., Xn) is sufficient for if and only if the conditional distribution of X1, ..., Xn, given W = w, does not depend on for any value w of W." (See Miller and Miller 1999, Definition 10.3, and Mood, Graybill, and Boes 1974, Definition 15, p. 301.) The intuition in the definition is that given you have calculated the sufficient statistic, it has already sucked out all information about from the sample. Yet because sufficiency is so important, authors go to great lengths, committing much text space with description and examples, in order to explain the concept. To help apply the concept of sufficiency to finding good estimators, authors present one or two factorization theorems. Even then, for a student new to the material, it can be difficult seeing that a factorization theorem follows intuitively from the concept. In Hogg and Craig (1995), over two pages of discussion precedes the definition, and then the definition is presented in a functional form like a factorization theorem. Hogg and Tanis (1997) wait until the end of the book to discuss the topic, and use the factorization theorem as the definition.

Let us assume that sufficiency has been defined in the course, and that a one-statistic-one-parameter factorization theorem like Theorem 1 (p. 318) of Hogg and Craig (1995) has been presented. The random variable of (1) provides a good second or third example for discussing sufficiency. The likelihood function in (2) is:

 (5)

From this expression, some students will say that you can't rewrite (5) so that there is a factorization, and hence there is no single sufficient statistic u1. Some may in fact come up with the idea that Y1 and Yn are jointly sufficient statistics. Example 1, page 348 of Hogg and Craig (1995), shows that Y1 and Yn are jointly sufficient for . They also point out that these are minimally sufficient and state that there can't be one sufficient statistic. So the density of (1) shows the student that all examples don't work out nicely. The fact that Y1 and Yn are joint sufficient statistics for explains why the variance of the midrange is so much less than for W1 and W3.

Remark. First, Mood, Graybill, and Boes (1974) give an alternate definition of sufficiency (Definition 16, p. 306). Paraphrased it reads, "W(X1, ..., Xn) is sufficient for if and only if the conditional distribution of T given W = w does not depend on for any statistic T (X1, ..., Xn)" [and any value w of W]. They point out that it is 'particularly useful in showing that a statistic is not sufficient.' Second, in principle, if you cannot see a factorization, this does not mean that a factorization is impossible. Thus, sometimes you need to derive the distribution of T, given W, to demonstrate that W is not sufficient. As an exercise, the first author showed that W2 was not sufficient by considering the joint density for Y1, Y2, and Yn. He then used Y2 as the T statistic, Mathematica® for pictures, and calculus to find the distribution of T = Y2 given W2. Of course it depended on .

The density for (1) is also useful for illustrating other topics. Mood, Graybill, and Boes (1974) show that is a location parameter (example 37, p. 333), and then that W2 is the uniformly smallest mean-squared error estimator in the class of location invariant estimators (example 39, p. 335). Hogg and Craig (1995) essentially show that Y1 and Yn are not complete (example 2, p. 349), and discuss the idea of ancillary statistics (example 5, p. 357). We have noted that the MLE is not unique, as any weighted mean of and will serve as an MLE. However, none of these are sufficient alone; so there is no minimal sufficient statistic for , because to enjoy that property the MLE must be sufficient.

## 5. Confidence Intervals and Tests of Hypotheses

After discussing the properties of good estimators, it is easy to forget that we wanted to estimate the time the bus takes to travel between stops. Let us agree that W = W2 is our best estimator of . It is unbiased, a maximum likelihood estimator, a function of sufficient statistics (if not sufficient itself), and has as small a variance as we have seen. Therefore we proceed being confident that we are doing a good job of estimation, if not 'the best' job. This suggests that W2 ± c, where c is a constant such that , might be the best confidence interval for . However, this is not the case, although a good student might find it an interesting exercise to find that constant c by determining the distribution of W2.

We illustrate a better way by first finding a good test of against , and then we use the relationship between tests and confidence intervals to find a confidence interval for . The likelihood ratio is given by

where is a maximum likelihood estimator such as W2. Note that provided , but if or . Thus, strictly following the likelihood ratio criterion, we would reject H0 if or . That is, we would accept H0 if . Clearly, if is true, we would never reject H0 with this test, and the significance level is . However, we must be concerned about the power of the test when . This suggests that we could improve the power if we selected a constant c slightly less than and accepted H0 if ; otherwise reject H0.

Let us find c so that the significance level is . Thus we want, when , that

Accordingly, . Going 'backwards' from this test, the corresponding confidence interval for is (Yn - c, Y1 + c), one which is based upon the joint sufficient statistics for . (In problem 4 of their Chapter VIII, Mood, Graybill, and Boes (1974) ask the student to verify that (Y1, Yn) is a confidence interval, and to find the confidence coefficient. In problem 21 of Chapter VIII, they give the student the more open-ended 'find a good confidence interval' for .)

The power of this test is easy for a student to compute by evaluating

Clearly, when , . Of course, is symmetric about and hence we need only consider when . If , then all and . So when ,

## 6. Possible Likelihood Ratio Exercises When the Range Is Unknown

We present several exercises when considering the uniform distribution on , where is unknown. Thus, . These are analogous to problems associated with the normal distribution when is unknown, and bright students can often think of the two-sample and k-sample problems on their own (possibly with a little direction from the instructor). In each of the following situations, the student is asked to show that the likelihood ratio equals the form given. Students may find these 'easy' or 'hard' depending on their insight.

### 6.1 One Sample: Test Against , Unknown

Then

where are the order statistics of a random sample of size n.

### 6.2 One Sample: Test Against , Unknown

Using the same notation as in Exercise 6.1,

### 6.3 Two Samples: Test Against , and Unknown

For convenience, let n = n1 = n2 and and be the order statistics of the independent random samples. Then

### 6.4 Two Samples: Test Against , Given That , but Unknown

Using the same assumptions and notation of Exercise 6.3,

### 6.5 k Samples: Test Against All Alternatives, When Are Unknown

For convenience, let all k independent random samples have the same sample size n, and be the order statistics of the ith sample. Then

### 6.6 k Samples: Test Against All Alternatives, Given That , but Unknown

Using the same assumptions and notation of Exercise 6.5,

To determine the significance level of each of the above tests that rejects H0 if , we need to determine the distribution of or some appropriate function of . These are much more difficult exercises.

## Acknowledgments and Requests

First we must thank the referees who made great suggestions, and one of whom caught a major error in one of our statements. We certainly appreciate their time and effort. The estimator in the Remark in Section 3.3 is theirs. We also thank the other authors whom we have referenced and who have dabbled with the interesting, one-parameter example. Their work has contributed to this paper.

Although the first author was dismayed when he found out that 'his' wonderful example was already in some of the literature (some published over 40 years ago; see Hogg and Craig (1956) and the 1959 edition of their book), he realized that he was not the only person to think this example was a good one. He then contacted the second author about presenting many of the applications of these uniform distributions in one document. When the first author later found the example in Mood, Graybill, and Boes (1974), a book he had used at least eleven years prior to his 'creating' the example, he almost thought that he had subconsciously plagiarized! However, with the continued rise in importance of statistics education and the fact that no single textbook has used this uniform distribution to any extent, the authors felt the project was still worthwhile.

The authors have one request: readers who find uniform distributions of the type considered here, referenced in any book or article, should contact the first author with the references. This is especially true of those who have written the book or article themselves! We have looked at many texts, but certainly not all of them.

## References

Casella, G., and Berger, R. L. (1990), Statistical Inference, Belmont, CA: Duxbury.

Hogg, R. V., and Craig, A. T. (1956), "Sufficient Statistics In Elementary Distribution Theory," Sankhya, 17, 209.

----- (1995), Introduction to Mathematical Statistics (5th ed.), Englewood Cliffs, NJ: Prentice Hall.

Hogg, R. V., and Tanis, E. A. (1997), Probability and Statistical Inference (5th ed.), Englewood Cliffs, NJ: Prentice Hall.

Miller, I., and Miller, M. (1999), John E. Freund's Mathematical Statistics (6th ed.), Upper Saddle River, NJ: Prentice Hall.

Mood, A. M., Graybill, F. A., and Boes, D. C. (1974), Introduction to the Theory of Statistics (3rd ed.), NY: McGraw-Hill.

Rice, J. A. (1995), Mathematical Statistics and Data Analysis (2nd ed.), Belmont, CA: Duxbury.

Dexter C. Whittinghill
Department of Mathematics
Rowan University
201 Mullica Hill Rd.
Glassboro, NJ 08028

whittinghill@rowan.edu

Robert V. Hogg
Department of Statistics and Actuarial Science
University of Iowa
Iowa City, IA 52242