University of Central Florida Journal of Statistics Education Volume 16, Number 1 (2008), jse.amstat.org/v16n1/brownstein.html

Copyright © 2008 by Naomi Brownstein and Marianna Pensky all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the author and advance notification of the editor.

**Key Words:** Transformations of variables; Estimation; Testing; Stress-strength model; Bayesian inference.

The objective of the present paper is to provide a simple approach to statistical inference using the method of transformations of variables. We demonstrate performance of this powerful tool on examples of constructions of various estimation procedures, hypothesis testing, Bayes analysis and statistical inference for the stress-strength systems. We argue that the tool of transformations not only should be used more widely in statistical research but should become a routine part of calculus-based courses of statistics. Finally, we provide sample problems for such a course as well as possible undergraduate reserach projects which utilize transformations of variables.

Transformations of random variables have been a standard tool in statistical inference. They have been used for solutions of a variety of statistical problems such as nonparametric density estimation, nonparametric regression, analysis of time series, construction of equivariant estimators and de-noising (see, e.g., Konishi (1991), Linton, Chen, Wang and Hrdle (1997), Marron and Ruppert (1994), Ruppert and Cline (1994), Van Buuren (1997), and Yang and Marron (1999)). However, with a number of complex applications of transformation tools, one very simple application of transformations has been largely overlooked by the statistical community. This paper discusses utilization of a well-known fact that, when some of the parameters are treated as known, the majority of familiar probability distributions are just transformations of one another. Therefore, results of parametric statistical inference for one family of pdfs can be reproduced without much work for another family.

How many distributions are there in statistics? There are several dozens of them in the two volumes of "Continuous Univariate Distributions" (Johnson, Kotz and Balakrishnan (1994 and 1995). One will find similar lists in many other books, the goal of which is statistical inference of a certain type such as, for example, in Voinov and Nikulin (1993). In the majority of texts, estimators, tests and other statistical procedures are usually constructed for each distribution family separately which leads to a great deal of calculations and sometimes errors. Unifying those procedures will result in saving hours of work.

The objective of the present paper is to provide a simple approach to statistical inferences using the method of transformations of variables. We demonstrate performance of this powerful tool on examples of constructions of various estimation procedures, hypothesis testing, Bayes analysis and statistical inference for the stress-strength systems. To the best of authors' knowledge, standard courses in statistics usually mention transformations only in relation to the maximum likelihood estimation (MLE) procedure. We argue that the tool of transformations not only should be used more widely in statistical research but should become a routine part of calculus-based courses of statistics. The material of the present paper can be used as such a supplement. For this purpose, we provide several sample homework problems and undergraduate research projects. Note that, although in what follows we consider only the case of a one-dimensional random variable, the theory has obvious extension to the case of random vectors. However, making this generalization in the present paper will unnecessarily complicate the presentation.

Consider a random variable *X* with the pdf *f(x|Θ)*, where parameter *Θ* is a scalar or a vector. Suppose also that there exist a random variable *ξ*, a monotone function *u* and a one-to-one transformation *ν* such that *X = u(ξ)*, where the pdf *g(ξ|τ)* of *ξ* has a different parameterization from *X*, namely,

g(ξ|τ) = f(u(ξ) | ν(τ)) |u′(ξ)|, Θ = ν(τ). | (1) |

Denoting *u*^{-1} = *v* and *ν ^{-1} = η*, we rewrite (1) as

ƒ(x|Θ) = g(v(x) | η(Θ)) |v′(x)|, τ = η (Θ). | (2) |

Now, let *g(ξ|τ)* be a popular distribution family, so that all sorts of statistical results are available. The objective of the present paper is to show how these results can be re-formulated for *f(x|Θ)*. Notice that the correspondence (1) is quite common but, however, is not used to a full extent. For example, Table 1 contains distributions which can be obtained from one or two parameter exponential distributions by appropriate re-parameterization. Our goal is however not to explore all possible correspondences of this sort but to provide few examples which will illustrate the general idea. The theory presented herein can be easily extended to many more kinds of statistical procedures and various other families of distributions.

Critics of this paper can adequately remark that some of the results listed here can be obtained in a general form for, say, one or two-parameter exponential families. The goal, however, is not to provide such a generalization but to supply a simple and yet powerful methodological tool to modify statistical procedures. The scale and location-scale families of exponential distributions are used here only as an example. In fact, techniques described below can be used for a distribution family which does not have a sufficient statistic.

The rest of the paper is organized as follows. In Section 2 we consider basic statistical inference for *f(x|Θ)* based on our knowledge of inference for *g(ξ|τ)*: sufficient statistics, maximum likelihood estimators (MLE) and uniform minimum variance unbiased estimators (UMVUE), Bayes estimators, interval estimators and likelihood ratio tests. Section 3 deals with more sophisticated statistical inference such as analysis of stress-strength systems and elicitation of
noninformative priors. In both Sections 2 and 3, we provide examples of statistical procedures which can be obtained with no effort by using known results and applying transformation of variables suggested in this paper. Section 4 presents several
sample homework problems and undergraduate research projects. Section 5 concludes the paper with the discussion.

In this section we discuss construction of the basic statistical procedure for *f(x|Θ)* based on the relevant knowledge for *g(ξ|τ)*. We provide the statements followed by few examples of their applications. The
proofs of all statements are very elementary and can be obtained by simple change of variables. Note that, while Theorem 2 on the use of transformations for construction of the MLEs is the common knowledge, the rest of the statements, though very useful, rarely appear in statistical texts.

Let the relation between *f(x|Θ)* and *g(ξ|τ)* be given by (1) and (2) and **X** = (*X*_{1}, *X*_{2}, …, *X*_{n}) and **ξ** = (ξ_{1}, ξ_{2}, …, ξ_{n}) be the independent and identically distributed (i.i.d.) samples from those pdfs with *X _{i} = u(ξ_{i}), i=1, …,n. Then the parametric family ƒ(x|Θ) inherits all useful properties of the family g(ξ|τ). To simplify the notations, in what follows v(X) = (v(X_{1}), v(X_{2}), …, v(X_{n}))*.

**Theorem 1 (Sufficient statistics)** *Let T( ξ) = T(ξ_{1},
ξ_{2}, …, ξ_{n}) be a scalar or vector valued sufficient statistic for the family of pdfs g(ξ|τ). Then T*(X) = T(v(X)) is a sufficient statistic for the family of pdfs ƒ(x|Θ)*.

**Theorem 2 (MLE)** *Let U( ξ) = U(ξ_{1}, ξ_{2}, …,
ξ_{n}) be a MLE of τ based on the sample ξ. Then U* (X) = ν(U(v(X))) is the MLE of Θ based on observations X. If, moreover, T=T(ξ) is a sufficient statistic for
τ, so that the MLE of τ has the form U(ξ) = V(T), then the MLE of Θ is U* (X) = ν(V(T*)) where T*= T*(X)is defined in Theorem 1.*

**Theorem 3 (UMVUE)** *Let h(τ) be a function of τ and let V(T) be the UMVUE of h(τ) based on the sufficient statistic T = T( ξ). Then V(T*) is the UMVUE of h*(Θ) = h(η(Θ)) based on the sufficient statistic T*(X)*.

**Corollary 1 (UMVUE)** *Let ψ(ξ _{1}, …, ξ_{s} ) be a function of observations ξ_{1}, …, ξ_{s} with expectation E_{τ} ψ(ξ_{1}, …, ξ_{s} ) over the pdf ∏^{s}_{j=1} g(ξ_{j}|τ). If V(T) is the UMVUE of E_{τ} ψ(ξ_{1}, …, ξ_{s} ), then V(T*) is the UMVUE of E_{Θ} ψ(v(X>_{1}), …, v(X_{s} )).*

**Theorem 4 (Bayes)** *Let U( ξ) = U(ξ_{1}, ξ_{2}, …, ξ_{n}) be the Bayes estimator of h(τ) based on the sample ξ and the prior pdf π(τ). Then, U* (X) = U(v(X)) is the Bayes estimator of h*(Θ) = h(η(Θ)) based on the sample X and prior pdf π(η(Θ))
|J_{η} (Θ)| where |J_{η} (Θ)| is the Jacobian of the transformation η(Θ). Moreover, if g(τ|ξ) is the posterior pdf of τ given ξ based on the prior pdf π(τ) then g*(Θ|X) = g(η(Θ)|v(X))\ |J_{η} (Θ)| is the posterior pdf of Θ corresponding to the prior pdf π(η(Θ)) |J_{η} (Θ)|.*

Theorems 2-4 and Corollary 1 refer to construction of various types of point estimators. However, interval estimation and hypothesis testing procedures can be modified in a similar way. We give few examples below.

**Theorem 5 (Interval estimation)** *Let φ(τ) be a parametric function of interest and let be an interval estimator of φ (τ) corresponding to the confidence level (1-γ), i.e.*

*Let *

then

is an interval estimator of φ (η(Θ))corresponding to the confidence level (1-γ).

**Theorem 6 (Likelihood ratio test)**
*Let U( ξ) = U(ξ_{1}, ξ_{2}, …, ξ_{n}) be a likelihood ratio test (LRT) statistic for testing hypothesis H_{0}: τ ε Ω_{0} versus
H_{1}: τ ε Ω_{0}^{C}. Consider sets Θ_{0} and Θ_{0}^{C} such that*

*Then U* ( X) = U(v(X)) is a LRT statistic for testing hypothesis H_{0}^{*}: Θ ∈ Θ_{0} versus H_{1}^{*}: Θ ∈ Θ_{0}^{C}. Moreover, if U(ξ)> C_{γ} is a level γ test for H_{0}, then U* (X)> C_{γ} is a level γ test for H_{0}^{*}.*

**Example 1.** Consider the task of finding the UMVUE of ln (σ)
given a sample **X** from the Weibull distribution (see Table 1). Using the fact that σ = ν(λ) = λ^{-1/α} we obtain ln (σ) = -1/ α ln (λ). Then, the UMVUE of ln (λ) based on the sample **ξ** from the one-parameter exponential distribution is of the form (see Voinov and Nikulin (1993), page 359)

V(T) = ln (T) - ψ(n) | (3) |

where and ψ(*n*) = is the Euler's psi-function ψ(*x*)= *d/dx *ln Γ(*x*). Theorems 1 and 3 with *v(x) = x ^{α}* imply that the UMVUE for ln (σ) is

Note that the formula for UMVUE of ln(σ) is not listed in the most comprehensive existing collection of UMVUEs (Voinov and Nikulin (1993)) in the table for the Weibull distribution nor are estimators for many other functions of σ that can be easily obtained by using our simple technique. Moreover, the method of transformations of random variables is not even listed among half a dozen techniques suggested for derivation of UMVUEs. If the authors introduced this very simple idea in the book, the tables would be more comprehensive, and the book would be much shorter.

**Example 2.** The tool of transformations is very useful for solution of simple problems in various statistics courses. For example, consider problem 8.17 in Casella and Berger (2002) which we re-formulate in notations of the present paper. When (*X*_{1}, …, *X*_{n}) and (*Y*_{1}, …, *Y _{m}*) are samples from one-parameter beta distributions (see Table 1) with parameters σ

(4) |

and to find the distribution of *T** when *H*_{0} is true. For this purpose, turn to problem 8.6 in the same section of Casella and Berger (2002) where samples (*X*_{1}, …, *X _{n}*) and (

and to find the distribution of *T* when *H*_{0} is true. Solving problem 8.6 we discover that the rejection region for *H*_{0} is of the form *T ^{n}* (1-

Applications of transformations are not limited to elementary statistical inference procedures. In what follows, we consider two models where transformations of variables can provide a final result with very little effort: the stress-strength system and derivation of noninformative priors. However, we want to point out that applications of transformation of variables are by no means limited to these two cases.

Consider estimation of probability *R = P(X _{1} < X_{2})* on the basis of observations

Let, as before, random variables *X _{i}*,

g_{i}(ξ_{i}|τ_{i}) = ƒ_{i}(u(ξ_{i}) | ν(τ_{i})) |u′(ξ_{i})|, Θ_{i}=ν(τ_{i}). | (5) |

where *ν* is a one-to-one transformation. As before, denote *u*^{-1} = *v* and *ν*^{-1} = η and observe that if the function *u* is monotonically increasing, then *R* = P( X_{1} < X_{2}) = P(u(ξ_{1}) < u(ξ_{2})) = P(ξ_{1} < ξ_{2}) = R*, so that

**Example 3.** Consider, for example, the situation described in Surles and Padgett (1998) where both *X*_{1} and *X*_{2} have the Burr type X distributions (see Table 1). The objective of the authors is to find the MLE and the UMVUE of *R* and also to develop lower confidence bounds for *R*. The authors carry out all details of statistical inference while the only thing they need is to exploit known results for estimation of *R* in the case when *ξ _{i}*,

Bayesian inference can be carried out with various kinds of prior distributions. However, in order to minimize subjectivity contributed by the choice of prior, one may choose *objective* Bayesian analysis, which is based on * noninformative* prior pdfs. There are various kinds of noninformative priors, e.g., Jeffreys' prior, reference prior and matching prior (for the review of noninformative priors see, e.g., Berger (1993), Bernardo and Smith (1994), and Robert (2001). Evaluation of those priors is usually not an easy task, therefore many authors were involved in their derivation. To help researchers in their use of objective Bayesian methods, Yang and Berger compiled the catalog of noninformative priors. It is evident from the catalog, that if the pdfs of two
distributions are related to each other as equation (1) and equation (2), and π(τ) is the Jeffreys or reference
noninformative prior for *g*(ξ|τ), then π* (Θ) = π(η(Θ))
|J_{η} (Θ)| is the noninformative prior of the same kind for
ƒ(*x*|Θ) (compare with Theorem 4). Hence, one can find noninformative priors for less common
distribution families easily using transformations without re-doing all the calculations.

**Example 4.** Consider the problem of finding noninformative priors for parameter σ of Burr type **X** distribution and parameters (σ, ρ) of the Pareto distribution (see Table 1). Note that neither of the families is a location-scale parameter family, and the catalog does not list the expressions for the Jeffreys and reference priors in this case. However, it states that in the case of the one-parameter exponential distribution the Jeffreys and the reference priors are of the form 1/λ while in the case of the two-parameter exponential distribution the Jeffreys prior is 1 while the reference prior is 1/λ. Hence, the Jeffreys and the reference priors for the parameter σ of Burr type **X** distribution are both equal to 1/σ, while in the case of the Pareto distribution the Jeffreys prior is equal to 1 while the reference prior is 1/σ.

**Example 5.** Kim et al. developed the Jeffreys' and the reference priors for the stress-strength system in the case of the Burr type **X** distribution. However, since the Burr type **X** distribution can be obtained from the one-parameter exponential distribution by
setting ξ = -ln(1-e^{-X2}), the results could have just been copied from Thompson and Basu where noninformative priors are obtained in the case of the one-parameter exponential distribution. Namely, the Jeffreys' and the reference priors which both are equal to 1/(λ_{1} λ_{2}) where λ_{1} and λ_{2} are the scale parameters of the exponential distributions. Thompson and Basu derived the
corresponding posterior pdf. Hence, the Jeffreys' and the reference priors coincide in the case of the Burr type **X** distribution and are equal to 1/(σ_{1}σ_{2}). The corresponding posterior can be found from Thompson and Basu by replacing *x* with -ln(1-*e ^{-x2}*) in the posterior and multiplying it by

Even in the absence of specific materials in the current textbooks, transformations of variables can be incorporated into an advanced course on statistical inference. While teaching topics in probability, an instructor should explain how one distribution can be obtained from another via transformations of variables (using, for example, Table 1 of this paper). Then each of the standard topics in statistical inference can be supplemented by examples and problems where transformations are applied. For this purpose, of course, Theorems 1-6 of this paper should be presented in class at appropriate moments. Below we list few sample problems which can be assigned for homework and more advanced projects that are suitable for undergraduate research.

**Problem 1.** The MLE of the parameters λ and μ in the two-parameter exponential distribution are and *X*_{(1)}, respectively. Using transformations of variables and Table 1, a) find the MLE's of the parameters ρ and σ of the Pareto distribution; b) find the MLE's of the parameters ρ and σ of the Power distribution.

**Problem 2.** It is well known that is the UMVUE of 1/λ where λ is the scale parameter in one-parameter exponential distribution (see Table 1). Using transformations of variables and Table 1, a) find the UMVUE of the parameter σ of the Rayleigh distribution; b) find the UMVUE of the parameter σ of the Extreme Value distribution.

Can the UMVUE for the parameter σ of the Weibull distribution be constructed using transformation techniques if parameter α of the Weibull distribution is known and α ≠ 1?.

**Problem 3.** a) Find the LRT of

based on sample *X*_{1}, …, *X*_{n} from a population with the pdf ƒ(*x*|λ, μ = λ exp(-λ(x-μ))\, I(X ≥ μ)where both μ and λ are unknown (problem 8.7(a) of Casella and Berger (2002).

b)\ Using transformations of variables and Table 1 find the LRT of

based on a sample from a two-parameter Pareto distribution where both parameters ρ and σ are unknown.

**Sample project 1.** Monograph Voinov and Nikulin (1993) contains UMVUEs for a variety of distribution families and parametric functions. However, as Example 1 of this paper
shows, the lists of UMVUEs are not complete. For example, Section A5 contains unbiased estimators of 105 functions of λ in one-parameter exponential family (some of those estimators are also suitable for functions of the scale parameter of the gamma distribution with the known shape parameter). However, Section A7 presents unbiased estimators for only 31 functions of the scale parameter of the Weibull distribution and Section A8 lists unbiased estimators for only 32 functions of the parameter of the Rayleigh distribution. Using Table 1 and technique of transfromations, one can expand the collection of UMVUEs in Sections A7 and A8 using the UMVUEs in Section A5 of Voinov and Nikulin. Some of these estimators will be totally new and can be published.

**Sample project 2.** Monograph Kotz, Lumelskii and Pensky (2003) discusses a variety of techniques for point and interval estimation of *R = P(X _{1} < X_{2})* on the basis of observations

Let *X*_{1} ∼ α (μ_{1}, τ_{1}) and *X*_{2} ∼ α (μ_{2}, τ_{2}) be independent random variables. After a careful examination one can notice that the alpha
distribution can be obtained from the normal distribution by mere transformation of variables and parameters. Hence, using
transformation of variables, one can derive the MLE, the UMVUE, the Bayes estimator
and an exact and an asymptotic confidence intervals for *R* in the case of the alpha distribution family (see also Theorems 2.7--2.9 in Kotz, Lumelskii and Pensky (2003) which describe applications of transformations in the case of the stress-strength problem). The resulting estimators will be entirely new and of interest to researchers in the field of reliability.

In the present paper, we exploit the tool of transformations of random variables to obtain various statistical inference procedures with minimal effort. We discuss a few very straightforward examples of applications of the transformations which can be appreciated even by a student who took an upper level undergraduate/lower level graduate statistics course. Moreover, using simple techniques suggested above, one can easily expand the list of UMVUEs provided in Voinov and Nikulin (1993) and construct statistical procedures for less familiar distribution families using known results. In addition, we discuss few more sophisticated applications of the transformation techniques in stress-strength model and elicitation of prior distributions.

The utility of the method, however, is not limited to just the procedures and models discussed in Sections 2 and 3. It can be applied to virtually any area where one has to deal with a variety of parametric families of distributions. It can be argued that since transformation of variables is the routine part of almost any calculus-based statistics course, it would be worth introducing the methodology developed above into standard textbooks, thereby saving hundreds of hours which are spent on re-deriving inferential procedures for various distribution families.

Marianna Pensky was supported in part by the NSF grants DMS-0505133 and DMS-0652524. Naomi Brownstein was supported by a Student Mentor Academic Research Teams (SMART) grant from the Burnett Honors College at the University of Central Florida.

Awad, A.M., Gharraf, M.K. (1986). Estimation of *P (Y < X)* in the Burr case: a comparative study.
*Commun. Statist. -- Simul. Comp.*, 15, 389-403.

Berger, J. O. (1993). *Statistical Decision Theory and Bayesian Analysis*. Corrected reprint of the second (1985) edition. Springer-Verlag, New York.

Bernardo, J. M., Smith, A. F. M. (1994). *Bayesian Theory*. Wiley, Chichester.

G. Casella, R. L. Berger (2002). *Statistical Inference*. Second Ed., Duxbury, California.

Enis, P., Geisser, S. (1971). Estimation of the probability that *Y > X*. *J. Amer. Statist. Assoc.*, 66, 162-168.

Johnson, N.L., Kotz, S., Balakrishnan, N. (1994). *Continuous Univariate Distributions*, Vol. 1. Second ed., Wiley, New York.

Johnson, N.L., Kotz, S., Balakrishnan, N. (1995). *Continuous Univariate Distributions*, Vol. 2. Second ed., Wiley, New York.

Kim, D.H., Sang, G.H., Jang S.C. (2000). Noninformative priors for stress-strength system in
Burr-type X model. *Journ. Korean Stat. Soc*, 29, 17-27.

Konishi, S. (1991). Normalizing Transformations and Bootstrap Confidence Intervals, *Ann. Statist.*, 19, 2209-2225.

Kotz, S., Lumelskii, Y., Pensky, M. (2003). *The Stress-Strength Model and Its Generalizations. Theory and Applications.* World Scientific Co., Singapore.

Linton, O. B., Chen, R., Wang, N., Härdle, W. (1997). An Analysis of Transformations for Additive
Nonparametric Regression, *J. Amer. Statist. Assoc.* 92, 1512-1521.

Marron, J.S., Ruppert, D. (1994). Transformations to reduce boundary bias in kernel density estimation. *J. Roy. Statist. Soc. Ser. B*56, 653-671.

Robert, C. P. (2001). *The Bayesian Choice. From Decision-theoretic
Foundations to Computational Implementation.* Second ed. Springer-Verlag, New York.

Ruppert, D., Cline, D.B.H. (1994). Bias reduction in kernel density estimation by smoothed empirical transformations. *Ann. Statist.* 22, 185-210.

Sun, D., Ghosh, M., Basu, A.P.(1998). Bayesian analysis for a stress-strength system under noninformative priors. *Canad. J. Statist.*, 26, 323-332.

Surles, J.G., Padgett, W.J. (1998). Inference for *P(Y < X)* in the Burr type X model. *J. Appl. Statist. Sci.*, 7, 225-238.

Thompson, R.D., Basu, A.P. (1993). Bayesian reliability of stress-strength systems. *Advances in Reliability*, ed. Basu, A.P., Elsevier Science Publishers, Amsterdam, pp. 411-421.

Tong, H. (1974). A note on the estimation of *P (Y < X)* in the exponential case. *Technometrics*, 16, 625. Errata: *Technometrics*, 17, 395.

Van Buuren, S. (1997). Optimal transformations for categorical autoregressive time series. *Statist. Neerlandica* 51, 90-106.

Vidakovic, B. (2004). Transforms in Statistics, *Handbook of Computational Statistics Concepts and Methods*, Chapter II.7. Eds. Gentle, J., Härdle, W., and Mori, Y., Springer-Verlag, Heidelberg pp. 199-236.

Voinov, V.G., Nikulin, M.S. (1993). *Unbiased Estimators and Their Applications. Volume 1: Univariate Case.* Kluwer Academic Publishers, Dordrecht, Netherland.

Yang, L., Marron, J.S. (1999). Iterated transformation-kernel density estimation. *J. Amer. Statist. Assoc.* 94, 80-589.

Yang, R. and Berger, J. (1997). A catalog of noninformative priors. ISDS Discussion Paper 97-42.

Naomi Brownstein

Department of Mathematics

University of Central Florida

Orlando, FL 32816

*naomi@brownstein.info*

Dr. Marianna Pensky

Department of Mathematics

University of Central Florida

Orlando, FL 32816

*mpensky@pegasus.cc.ucf.edu*

Volume 16 (2008) | Archive | Index | Data Archive | Resources | Editorial Board | Guidelines for Authors | Guidelines for Data Contributors | Home Page | Contact JSE | ASA Publications