![]() |
Ulf Olsson
Swedish University of Agricultural Sciences
Journal of Statistics Education Volume 13, Number 1 (2005), jse.amstat.org/v13n1/olsson.html
Copyright © 2005 by Ulf Olsson, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.
Key Words:Generalized confidence interval.
Note that if X is log-normal, then the median of Y is equal to the log of the median of X. In this paper we will assume that it is the arithmetic mean of X, and not the median of X, that we want to make inference about.
It is a rather straight-forward task to use the log-transformed data Y to calculate a confidence interval for the expected value (mean value) of Y. We will discuss how this result can be used to calculate a confidence interval for the expected value of X.
It holds (see e.g. Zhou and Gao, 1997) that
![]() | (1) |
This means that the mean value of X is not equal to the antilog of the mean
value of Y. An estimator of log() can be calculated
from sample data as
![]() | (2) |
An estimator of the variance of is
given by
![]() | (3) |
see e.g. Zhou and Gao, (1997).
One sample of n=40 observations was generated, using SAS (1997) software,
from a log-normal distribution
with parameters = 5 and
= 1. The
population mean of X is
= 244.69. The observations were transformed as
Y=log(X). The raw sample
data are given in Table 1. The sample data are summarized in
Table 2.
914.9 | 1568.3 | 50.5 | 94.1 | 199.5 | 23.8 | 70.5 | 213.1 |
44.1 | 331.7 | 139.3 | 115.6 | 38.4 | 357.1 | 725.9 | 253.2 |
905.6 | 155.4 | 138.1 | 95.2 | 75.2 | 275.0 | 401.1 | 653.8 |
390.8 | 483.5 | 62.6 | 128.5 | 81.5 | 218.5 | 308.2 | 41.2 |
60.3 | 506.9 | 221.8 | 112.5 | 93.7 | 199.3 | 210.6 | 39.2 |
Variable | Mean | Median | St. dev. |
---|---|---|---|
X | 274.963 | 177.350 | 310.343 |
Y = log(X) | 5.127 | 5.170 | 1.004 |
For our example data, the naïve approach would produce the point estimate
= e5.127=168.51.
A standard 95% confidence interval for
is calculated as
with limits
[4.806, 5.448]. This would give limits
for
as e4.806 = 122.24 and e5.448 = 232.29.
Note that this confidence interval does not cover the population mean value, which is 244.69.
Of course, this can occur because of chance; after all, we have only studied
one single sample so far. However, it is noteworthy that the interval does not
even cover the sample mean, which is 275.0. This illustrates the fact
that the naïve method gives a biased estimator of
.
3.3 Cox method
Cox (quoted as "personal communication" in
Land, 1971) has suggested that a confidence interval for
E(X)= can be
calculated in the following way:
Calculate a confidence interval for log() as
![]() | (4) |
where z is the appropriate percentage point of the standard Normal
distribution. The limits in this confidence interval are back-transformed to
give a confidence interval for . The method is valid for large
samples. A similar approach has been suggested by
Zhou, Gao, and Hui (1997) for the
two-sample case.
For the sample data, =5.127 and s2=1.010. The 95%
confidence interval for log(X) is
with confidence
limits [5.248, 6.016]. Taking anti-logs
we obtain the limits in the 95% confidence interval for
as
e5.248 = 190.24 and e6.016 = 409.82, respectively. A point estimate of
is
.
For the sample data, = 5.127 and s2 = 1.010. The 95%
confidence interval for log(X) is
with confidence limits [5.237, 6.027]. Taking anti-logs
we obtain the limits in the 95% confidence interval for
as
e5.237 = 188.0 and e6.027 = 414.7, respectively. For this sample
size, the difference compared to the standard Cox method is small.
Calculate and s2 from the data.
For i = 1 to m (where m is large, for example m=10000)
Generate Z ~ N(0, 1) and.
For each i, calculate
.
(end i loop)
For a 95% confidence interval, the 2.5% and 97.5% percentiles for T2
are calculated from the 10000 simulated values. These are the lower and upper
limits in a confidence interval for . This means
that a 95% confidence interval for the lognormal mean is obtained as
[exp(T2;0.025), exp(T2;0.975)].
![]() | (5) |
In our example, the 95% confidence interval can be calculated as
,
which gives the limits as [178.84, 371.16].
CO level | Date |
---|---|
12.5 | 9/11/90 |
20 | 10/4/90 |
4 | 12/3/91 |
20 | 12/10/91 |
25 | 5/7/92 |
170 | 8/6/92 |
15 | 9/10/92 |
20 | 9/22/92 |
15 | 3/30/93 |
The 95% confidence intervals for the example data, using the different methods we have discussed, are given in Table 4. It may be noted that our modified Cox method gives a somewhat wider interval than the Cox method, as expected. The generalized confidence interval has an upper limit that is well above the others, for these data.
Method | Lower limit | Upper limit |
---|---|---|
Naïve approach | 9.15 | 40.95 |
Cox method | 14.15 | 68.49 |
Modified Cox method | 12.31 | 78.72 |
Large-sample approach | -6.11 | 73.11 |
Generalized confidence interval | 16.65 | 153.19 |
The confidence intervals included are:
Each interval was compared to the population mean value = 244.69,
and the number of intervals below, covering, or above
was calculated.
The results that
are summarized in Table 5 give the percentage of the samples that cover
, and the percentage of the samples that produce intervals above or
below
.
Naïve approach | Cox method | Modified Cox method | ||||||||||
n | Below | Covering | Above | Below | Covering | Above | Below | Covering | Above | |||
5 | 13.5 | 86.2 | 0.3 | 10.6 | 87.2 | 2.2 | 5.9 | 93.5 | 0.6 | |||
10 | 31.3 | 68.5 | 0.0 | 8.2 | 91.1 | 0.7 | 5.9 | 93.9 | 0.2 | |||
20 | 54.8 | 45.2 | 0.0 | 4.8 | 94.2 | 1.0 | 3.6 | 95.7 | 0.7 | |||
30 | 75.9 | 24.1 | 0.0 | 6.5 | 92.6 | 0.9 | 5.4 | 93.9 | 0.7 | |||
50 | 94.3 | 5.7 | 0.3 | 4.0 | 95.4 | 0.6 | 3.9 | 95.5 | 0.6 | |||
100 | 99.9 | 0.1 | 0.0 | 3.3 | 95.5 | 1.2 | 3.2 | 95.7 | 1.1 | |||
200 | 100.0 | 0.0 | 0.0 | 2.6 | 95.2 | 2.2 | 2.6 | 95.2 | 2.2 | |||
500 | 100.0 | 0.0 | 0.0 | 3.0 | 95.1 | 1.9 | 3.0 | 95.1 | 1.9 | |||
1000 | 100.0 | 0.0 | 0.0 | 3.3 | 94.4 | 2.3 | 3.3 | 94.4 | 2.3 |
Large sample approach | Generalized C I | |||||||
n | Below | Covering | Above | Below | Covering | Above | ||
5 | 16.8 | 83.0 | 0.2 | 1.3 | 94.1 | 4.6 | ||
10 | 16.4 | 83.6 | 0.0 | 2.2 | 93.7 | 4.1 | ||
20 | 12.0 | 87.9 | 0.1 | 1.9 | 95.2 | 2.9 | ||
30 | 14.0 | 85.6 | 0.4 | 2.1 | 94.6 | 3.3 | ||
50 | 9.4 | 90.4 | 0.2 | 2.2 | 95.0 | 2.8 | ||
100 | 7.6 | 92.1 | 0.3 | 2.9 | 93.7 | 3.4 | ||
200 | 6.5 | 92.2 | 1.3 | 1.3 | 95.9 | 2.8 | ||
500 | 4.9 | 94.0 | 1.1 | 2.8 | 94.2 | 3.0 | ||
1000 | 4.8 | 93.8 | 1.4 | 2.3 | 95.8 | 1.9 |
The large-sample method, that is based on Central Limit Theorem arguments, gives a consistently lower coverage than 95%. Sample sizes of more than 200 seem to be needed to obtain a confidence level close to the nominal one. As expected, the intervals based on the naïve approach fail, since these intervals are intervals for some other parameter. The simulations were also run with standard deviations 0.5 and 2. All methods performed somewhat worse when the standard deviation increased but the relationships between methods remained unchanged.
It seems that the confidence intervals based on the modified Cox method work well for practical purposes. The calculations are simple and may be performed by hand, if desired. The generalized confidence interval approach also works well; a small disadvantage is that it requires a computer to simulate the sampling distribution.
Land, C. E. (1971), “Confidence intervals for linear functions of the normal mean and variance,” Annals of Mathematical Statistics, 42, 1187-1205.
SAS Institute Inc. (1997), SAS/STAT software: Changes and enhancements through Release 6.12, Cary, NC: SAS Institute Inc.
Weerahandi, S. (1993), “Generalized confidence intervals”. Journal of the American Statistical Association, 88, 899-905.
Zhou, X-H., and Gao, S. (1997), “Confidence intervals for the log-normal mean,” Statistics in Medicine, 16, 783-790.
Zhou, X-H., Gao, S., and Hui, S. L. (1997), “Methods for comparing the means of two independent log-normal samples,” Biometrics, 53, 1129-1135.
Ulf Olsson
Department of Biometry and Engineering
Swedish University of Agricultural Sciences
Box 7032, S-75007
Uppsala
Sweden
Ulf.Olsson@bt.slu.se
Volume 13 (2005) | Archive | Index | Data Archive | Information Service | Editorial Board | Guidelines for Authors | Guidelines for Data Contributors | Home Page | Contact JSE | ASA Publications