Journal of Statistics Education, v17n1: António Teixeira, Álvaro Rosa and Teresa Calapez

António Teixeira, Álvaro Rosa and Teresa Calapez
IBS – ISCTE Business School (Lisbon)

Journal of Statistics Education Volume 17, Number 1 (2009), jse.amstat.org/v17n1/teixeira.html

Copyright © 2009 by António Teixeira, Álvaro Rosa and Teresa Calapez, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.

Key Words: Effect size; Excel; Non-central distributions; Non-centrality parameter; Normal distribution; Power.

Abstract

This article presents statistical power analysis (SPA) based on the normal distribution using Excel, adopting textbook and SPA approaches. The objective is to present the latter in a comparative way within a framework that is familiar to textbook level readers, as a first step to understand SPA with other distributions. The analysis focuses on the case of the equality of the means of two populations with equal variances for independent samples with the same size.

This is the situation adopted as case 0 by Cohen (1988), a pioneer in the subject, to develop his set of tables and so, the present article can be seen as an introduction to Cohen’s methodology applied to tests based on samples from normal populations. We also discuss how to extend the calculation to cases with other characteristics (cases 1 to 4), similarly to what Cohen proposes, as well as a brief discussion about the advantages and shortcomings of Excel. We teach mainly in the area of business and economics, which determines the scope of our analysis.

1. Introduction

The cause of this situation is most surely the fact that the extension of SPA involves an additional set of (non-central) distributions and a greater number of tables. The panorama is not exactly the same in the social sciences where the greater importance given to the subject can be explained by the American Psychological Association (APA) requirements (Kline, 2004; American Psychological Association, 1994 and 2001).

As Smithson (2002) alleges when talking about a similar subject (confidence intervals with non-central distributions), "prior to the wide availability of computing power, exact confidence intervals for the proportion for small samples were painstakingly hand-calculated and tabulated. Confidence intervals whose underlying sampling distributions are not symmetrical and not a constant shape were labor intensive and impractical without computers, so they were (and still are) neglected in textbooks and, ironically, popular statistical computer packages." For example, in what concerns SPA, SPSS Inc offers a product that is independent of the SPSS application itself, Sample Size.

We intend to fill this gap by presenting an integrated analysis with a common language that will allow the development of a set of tools to be used knowingly by students and practitioners, giving them a path to think about the problem and to analyze results produced by others, eliminating the pure automation involved in obtaining numbers by pressing a key in the computer keyboard, or avoiding what Pestana and Velosa (2002) call turning off their heads when turning on the computer. With the advent of computers and the generalization of its use, computation limitations are something from the past. Excel has the capacity to implement these models. The necessary capabilities that are not built in can be created through user defined routines. It is not certain that exact results are always obtained but Excel is a privileged tool for students to perform and follow the several steps involved in the situations under consideration.

When designing a course we must be aware that students can have in their future professional lives two levels of contact with the subjects taught. One refers to performing statistical analyses and the other reading those made by others. The teacher must not consider courses with closed borders. Doors must be opened, allowing insights into things beyond the course limits. As we teach at graduation and post-graduation levels, we have been confronted with many cases of students declaring "now I understand what you really meant when saying … ." If no such doors exist, crossing to wider perspectives is more difficult.

Understanding Cohen’s framework leads to understanding the effect size approach widely used in the social sciences. Even if students will never use this methodology, they may be exposed to it when reading analyses performed by others. Additionally, they may be exposed to it, especially in the fields of the social sciences, in courses like research methods. We consider that teachers who have such students in their classes might find it helpful for thinking about how to present power analysis in class.

Although table based teaching introduces limitations to the domain to be covered and in some cases returns discrete information about a continuous underlying distribution, we decided to conduct this first approach in parallel with the tables created by Cohen (1988). In this way, the present article is an introduction to Cohen’s methodology applied to tests based on samples from normal populations. It could be seen as "the chapter that Cohen (1988) did not write" or a preamble to a better understanding of his work. Our methodology will then show how one can easily construct Cohen-like tables using Excel. Besides constituting a tribute to Cohen’s pioneer work, we also believe this is the best way to convey the knowledge at the level we want at this stage, which is to understand the methods involved and the tools available instead of supplying automated solutions subject to the "garbage in garbage out" perils. We do not intend to follow the instant pudding route. Even if the user has access to software applications that will directly give the desired results, we intend to arm him with enough knowledge and alternative routes to check those results by performing the calculation process step by step.

Tables can also be seen as a base for charting results. In this specific case, there is a wide range of combinations of inputs and formulas to use. When comparing functions and routines used, these tables define what we can call a Cohen’s universe, fixing borders. We can confine accuracy analyses to this universe, especially when dealing with the subject for teaching purposes.

Furthermore, computing power directly does not replace the use of tables in teaching. Instead of static tables we can have applications that will reformulate the tabulated values each time entry values are changed. For example, in SPA, the teacher can generate problem-specific tables and include them in examination papers instead of generic tables that would require a large variety of entries. This kind of models and reasoning tends to assume a more important role, as classes - and even examinations – are becoming more and more lab based. The construction of tables is also a good exercise for students to become familiar with the characteristics of statistical distributions.

As this article is Excel based, we also include a section about software limitations, somewhat beyond the scope of this specific article, in order to create a centralized reference to this matter to be used in forthcoming work. This article is just the first of a series about power and precision analysis. The concentration of these considerations helps to confer unity to the set of final articles.

To avoid unnecessary extension of the article, we assume that the reader is familiar with concepts such as type I (alpha) and type II (beta) errors, structure of hypotheses testing and the power (pi) of a test, avoiding the need to explain them in detail. The meaning of all symbols used can be found at the end of the article. This results in a more compact and easier to read article.

2. Excel Advantages, Shortcomings, and Oddities

The wide use of Excel is the main reason for its choice. It is a tool familiar to most users for the development of models related to a wide range of scientific and practical fields. Despite the limitations we are identifying here, it is an excellent tool for teaching purposes, allowing students to shed light into what can otherwise be seen as black boxes. With Excel one can learn while performing all calculation steps and easily creating graphical representations. On the other side, the shortcomings we are listing emphasize the necessity of analyzing critically computer generated results even in applications that are considered as more reliable.

There are two eras concerning the use of Excel for statistical analysis: pre and post Excel 2003. In his review of Excel 2003, Walkenbach (2008), referring to what Microsoft indicates as enhancements in the statistical functions, states: "This, of course, is a new definition of the word enhancements. Truth is, these functions have been broken for more than a decade, and they finally return accurate results in Excel 2003."

Knüsel (1998, 2002, 2005) has been studying the accuracy of Excel statistical functions and considers that "some of previously indicated errors in Microsoft Excel 97 and Excel XP have been eliminated in Excel 2003. But some others have not been corrected … and new ones have been found …". Sleeper (2006) approaches this subject in a subchapter named Defects and Limitations of Microsoft Excel Spreadsheet Software where he presents a list of problems either solved or remaining in Excel 2003, indicating that he still found the latter in the beta version of Excel 2007. We have checked the cases indicated in Knüsel (2005) and found that nothing has changed in the release version.

In statistical terms the remaining problems concern the accuracy of statistical functions in part of their domain, namely, using Excel designations, in RND (in what concerns the random number generator), POISSON, BINOMDIST, CHIINV, GAMMADIST, BETAINV, and NEGBINOMDIST.

Sleeper (2006) also refers to the ill-functioning of Excel’s QUARTILE and PERCENTILE functions. It is not quite so. As Larson and Faber (2000) acknowledge, there are several ways of finding the quantiles of a data set. Instead of the most usual formulas to calculate the position of the first and third quartiles, (n+1)/4 and 3 (n+1)/4, Excel uses (n+3)/4 and (3n+1)/4 respectively. These seldom appear in textbooks but can be found, for instance, in Murteira et al. (2002). The method should be included in the help file to avoid misinterpretations.

Another shortcoming we did not find reported is the lack of ability to detect multimodal data: the MODE function, given any data set, returns only one value. The problem is that, for multimodal data, the value reported may vary upon sorting.

As a conclusion Sleeper (2006) acknowledges that for statistical calculations Excel 2003 and later are not perfect but acceptable, recommending that the Analysis Toolpak add-in provided with any version of Excel should never be used, giving as reference the Microsoft Knowledge Base Article to get a detailed view of the problems involved (Microsoft, 2007), taking into account that some of them have been improved.

Besides the limitations indicated above there is a wider set of oddities, some of them presented by Walkenbach (2008) regarding the sum of Boolean values. We also found that validation of input does not work when using the Paste Special alternative. Microsoft (2007a, 2007b) elaborates about problems involving floating-point arithmetic and how to remedy them.

Whenever a new application is developed bugs tend to appear. Nevertheless, Excel is a valuable and versatile tool to accomplish a wide range of objectives including the one we are pursuing: to teach statistics.

3. Statistical Power Analysis with the Normal Distribution Using Microsoft Excel

There are several different situations involving the tests for the mean of one or two populations. Cohen (1988) considers those indicated in Table 1.

Case 0	2 means; σ_a = σ_b ; n_a = n_b
Case 1	2 means; σ_a = σ_b ; n_a ≠ n_b
Case 2	2 means; σ_a ≠ σ_b ; n_a = n_b
Case 3	1 sample size n
Case 4	2 paired observations in a sample size n

He develops the tables for case 0, although they can also be used for the other cases provided some additional calculations are made to find the tables’ input values. Consequently, we will deal with this case as the base situation. The tests can also be one tailed (upper or lower) or two tailed. Cohen presents tables for two and one upper tailed tests (obviously, the latter will allow the calculation of power of one lower tailed test). All the other cases can be transformed into one of these two. It is notable that Cohen leaves aside the case in which σ_a ≠ σ_b ; n_a ≠ n_b. We will also ignore it for now, leaving its consideration for future work on power and precision while testing means.

Due to its similarity, we will deal only with two tailed tests, using the upper tail to calculate the power. The case we consider is then the one on the left in Table 2, where H_p and μ_p refer to the values of the mean to use when calculating power. This restriction does not affect the generality of the conclusions.

The test statistic can be either the sample mean

or its standardized value

in the case of one population whereas in the case of two populations, it can be the difference between the two sample means

or its standardized value

. Textbooks show both, standardized and non-standardized, to arrive at a decision, but the value of the type II error, beta, is calculated (almost always) using the non-standardized test statistic. It happens that Cohen’s methodology asks for the opposite, that is, the use of the standardized value.

In any case, the test is carried out by defining a rejection

and a non-rejection

interval that are expressed in terms of the statistic used for the test. As we will calculate power as π = 1 − β, we are concerned with

, which is defined in Table 3. Note that we are using ]a;b[ for an open interval, [a;b] for a closed one, and every mixed situation accordingly with these.

Test Statistic

Two Tailed

Upper Tailed

Unstandardized

Standardized

We can approach power from two perspectives: (1) a posteriori: calculating the power after performing the test; (2) a priori: calculating the sample size needed to obtain a minimum desired value for power, as well as for alpha. Additionally, we will consider only the cases in which the finite population correction factor for the calculation of the standard error is not necessary.

3.1 Computation of Power Using the Usual Way

As an example of calculating power for the two tailed test alternative, we will use the specific situation presented in Table 4.

An investor is interested in testing the hypothesis that the mean household monthly income is the same in two metropolitan areas. For this purpose, a statistical consultant performed a survey using two independent simple random samples with the same size: , obtaining the following results for the mean of each sample: .

From the results of previous studies, the consultant assumes that the standard deviation of the household incomes in both areas is the same: The value is used for the statistical significance of the test. The alternative value of the difference between the two means for computing the power of the test is

3.1.1. Power Computation a posteriori

The usual way of performing this test using

as the test statistic leads to the results presented in Table 5. Putting aside for now the calculation of beta, the decision is taken based on the assumption that H₀ is true, i.e. there is no difference between the two population means. From Table 5 we can see that the test statistic belongs to the non-rejection interval, leading to the non-rejection of H₀.

The probability of committing a type II error (beta) is associated with the non-rejection interval of the normal curve represented at the left on Figure 1. However, when computing the type II error, the falsity of H₀ is assumed, meaning that we can no longer work with a normal distribution with μ₀ = (μ_a − μ_b)_H₀ = 0.

Therefore, the calculation of power needs specification of alternative values for the difference between the two means. The situation can be dealt (as we do in this case) either with the consideration of a point value indicated in H_p, or by constructing a power function for several alternative values for the mean. We followed the former because the later is just a repetition of the procedure presented.

Type I error (alpha) is represented by the darker (blue) shaded area in Figure 1 and Type II error (beta) by the lighter (grey) shaded area, with value

The power of the test, π = 1 - β = 0.89, is represented by the brown area in Figure 1.

Figure 2 shows how this value can be easily obtained using the functions embedded in Excel. Please note that in the Excel sheets pictured, grey background cells correspond to inputs, white background cells to intermediate results and inverse background (yellow over blue) cells to final results.

Figure 2 – Computation a posteriori of power using Excel formulas and embedded functions

3.1.2. Power Computation a priori

The a priori computation of power is the determination of a sample size n guaranteeing that pre-defined values for alpha and beta will not be exceeded. Let us suppose that we want to test the equality of the population means in a way that the values of alpha and beta will not be greater than 0.05 and 0.10 respectively. Figure 3 illustrates the base from which we can derive a formula to compute the minimum n.

Figure 3 – Basis to derive the sample size given maximum alpha and beta values

As we are only considering cases in which the beta value is no greater than 0.50, then z_β will always be negative, whereas z_1-a/2 will always have the opposite sign. This is reflected in the graphic since the upper critical value,

, is always between the values of μ₀ and μ_p: we have to sum some positive quantity to μ₀ in order to reach it. In the same way, we have to sum some negative quantity to μ_p to obtain the same value. Consequently, the subtraction (z_1-a/2 − z_β) can be seen as the sum of two values of the same sign, causing its square to be equal to the squared sum of the absolute values. Thus, the above formula (3) can be rewritten as:

The advantage of using (4) instead of (3) is that it still holds if the power is calculated using the lower tail, since

and

. Also note that (4) gives us in fact a lower bound on n, so we should take as minimum sample size the first integer greater or equal to the value obtained.

This formula is very easy to calculate in Excel, as it can be seen in Figure 6. For the example given, taken an alpha of 0.05 and a beta of 0.10, the sample size for each population’s sample should be no lower than 38.

3.2 Computation of Power Using the Standardized Value

3.2.1. Power Computation a posteriori

Another way of performing the test is using the standardized normal distribution, i.e., adopting

as the test statistic. For the example above the intervals of rejection and non-rejection would be in this case:

As the decision must be the same as before, the standardized value lies in the non-rejection region. To compute the type II error, the remarks made previously remain valid: it is a matter of calculating the probability of occurrence of the non-rejection interval with a normal curve that is not the one corresponding to the validity of H₀. Only now we are using a different variable,

, which is standardized only under the validity of the null hypothesis. As, for the computation of power, we must assume the falsity of H₀, then

is no longer standardized, but still has unitary variance. For the example given, the alternative normal curve has a mean of

We will use an oxymoron to name it: non-central standardized normal distribution, as it will be useful in the more general framework we want to develop, although that distribution does not exist (if it is standardized, its mean is zero). It suggests that we are referring to a normal distribution in terms of z, but with a non-zero mean of μ = 0 + δ. Although this is not standard terminology, it will help facilitate understanding the (central) t versus non-central t distribution.

Unlike other distributions, non-central and central normal distributions have exactly the same shape, that is, non-centrality does not affect the curve, except for a horizontal translation along the z axis. This is why computing power is much easier when the normal distribution is concerned. In this case:

To completely specify a non-central standardized normal distribution it is only necessary to define the non-centrality parameter, δ. In the present case, this parameter equals the standardized difference of means (7), i.e.

Note that: i) the non-central distribution tends to the correspondent central distribution as δ tends to zero; and ii) in the above formula, the denominator is the standard error of the difference between the two means because what is desired is the distribution of this variable.

We can use Excel in two different ways to calculate power using z as the test statistic. The first one is conceptually pure, adding the non-centrality parameter to the mean (zero in this case), as illustrated in Figure 4.

The other way is to decentralize the critical values of z by an amount δ and use the central standardized normal distribution to calculate probabilities, as shown in Figure 5:

The first option is conceptually sound while the second facilitates the introduction of formulas by using the NORMSDIST instead of the NORMDIST function.

3.2.2. Power Computation a priori

Equation (4) shows that the formula to compute a priori power does not contain the test statistic. This means that it can be applied to either

3.3 Computation of Power Using the Effect Size

Instead of introducing a non-centrality parameter, δ, Cohen, when formulating his SPA framework, defined the effect size d in answer to the necessity he felt of "a ‘pure’ number, one free of our original measurement unit, with which to index what can be alternately called the degree of departure from the null hypothesis of the alternate hypothesis, or ES (effect size) we wish to detect". The effect size can be seen as the extent to which the two populations do not overlap (it is the extent to which the experiment has an effect of separating the two populations). In this case what is involved is the distribution of the original random variable and not a statistic obtained from sampling. Thus, the measure of dispersion used for standardization is the standard deviation of that variable (σ_A = σ_A = σ).

For the two independent samples case, with homogeneous variances, this effect size is:

Cohen (1988) also presents other measures of the effect size (U₁, U₂, U₃, r and r²). We do not discuss these here. Cohen suggests that, in the absence of information, one might chose to fix the effect size as small (d=0.2), medium (d=0.5) or large (d=0.8) depending upon the subject studied.

3.3.1. Power Computation a posteriori

One of the advantages of working with effect sizes and standardized statistics is that we no longer need to specify explicitly the alternative value, μ_p, to consider for the calculation of power. We only have to give, as input, alpha, the sample size n and the effect size considered.

For the example we are following, with a sample size of 36 and an alpha value of 0.05, we can either fix an alternative value of 450 (as we have done previously) or, equivalently, chose an effect size of 0.75 (large). Type II error, and hence power, can then be easily computed in Excel, as d=0.75 gives

and β=0.111 as seen on Figure 5.

This relationship allows the construction of general tables, which we will refer to as Cohen-like tables, presented in section 4.

3.3.2. Power Computation a priori

The formula to compute the sample size that guarantees minimum pre-defined values of alpha and beta is very similar to the one used when working with the unstandardized statistic and it can be deducted directly from (4):

4. How Cohen Deals With the Other Cases

Cohen’s tables consider case 0 for a posteriori and a priori power. Let us see the methods he proposes to calculate power for the other cases based on the results of the tables, recalling that we are analyzing only two tailed tests using the upper tail when the value of beta is not greater than 0.50.

4.1 A Posteriori Power

Adopting Appendix B’s Cohen-Like Table as reference, Table 6 presents the formulas to get the inputs for reading a posteriori power for all cases. The inputs for Appendix B’s Table are the effect size, d, and the sample(s) size n, while the output, a posteriori power π, is expressed as a percentage. Instead of computing d, one can always choose directly a magnitude for the effect size, as proposed by Cohen.

Table 6 – Applying Cohen-Like table for a posteriori power to all cases
INPUTS
Case	s	n	d
Case	s	n	Formula	Small	Medium	Large
0				0.2	0.5	0.8
1		(harmonic mean)		0.2	0.5	0.8
2				0.2	0.5	0.8
3				0.2	0.5	0.8
4	Case 3 methodology can be applied for a variable representing the difference between the paired observations:

4.2 A Priori Power

The starting point is the Cohen-Like Table presented in Appendix C. As before, we can either compute the effect size, d, or choose a small, medium or large magnitude of effect. After fixing the power to be obtained, π, we can directly read a value, n, from the Appendix C Table. Table 7 shows the relationship between this value and the sample(s) size(s) to consider.

Table 7 – Applying Cohen-Like table for a priori power to all cases
Case	Sample(s) size(s)
Case	Sample(s) size(s)
0
1	Fix n_a (> n/2)
2
3
4	Case 3 methodology can be applied for a variable representing the difference between the paired observations:

5. Effect Size, Power and Precision in Statistical Inference

When conducting inference for population mean(s) we can follow two mutually exclusive routes: power or precision analysis. The former is done through testing hypotheses and the latter by the construction of confidence intervals. They should not be mixed. There is a mathematical similarity between the two approaches, namely in dealing with variance, but they are addressing different questions. Our goal is to deal with power analysis. This reference is just a warning. Borenstein (2008) states that it is wrong to construct confidence intervals to estimate a parameter value (precision approach) when the sample size is obtained by fixing the maximum value for the type II error (power approach) as illustrated in Figure 7. In the last case the confidence interval must be used to make a decision about the null hypothesis.

He considers that power analysis gives power for a specific effect size: "For example, the researcher might report ‘If the treatment increases the response rate by 20 percentage points the study will have power of 80% to yield a significant effect.’" In simple terms, he says that "the effect size used in the power analysis should represent the smallest effect that would be of clinical substantive significance."

A chain store intends to open a new outlet in some region A, if it is possible to assume an average household income of $50,000 for that area. In this case, the business will surely not be affected if the average household income is $49,500. Then, how large should the difference be in order to decide that it is not worth opening the new outlet? It depends mainly on the knowledge the researcher has about the subject being studied and must be defined at the start of an SPA in terms of the effect size (a standardized index). This definition is one of the factors that affects the size of the sample in what we call the design of the sample for a priori power. We can also see power as our prediction of what proportion of the studies will yield a statistically significant effect (Borenstein, 2008).

A study designed to detect a large effect will not have adequate power to detect a small effect. A study designed to detect a small effect will have power enough to detect a large effect. In this last case the power is greater than needed, that is, the sample is greater than necessary.

6. The Effect Size Approach in Class

We have been introducing some changes in the structure and methods in one of our courses in order to achieve a greater cohesion in its contents, as well as to diminish the number of formulas used (that in some cases are no more than variations on the same theme). For example, instead of treating separately sampling distributions, confidence intervals and hypotheses testing, we deal first with the case of one mean for all those aspects, and then we present the case for two means and independent samples, and so on. At every step the focus is on the calculation of the respective standard error.

Until now the experience has been positive, giving students an easier and smaller set of formulas to deal with. In what concerns the computation of beta, we anticipate that this framework will also simplify its computation – or at least without significantly increasing its complexity – while opening doors to new areas. Our previous experience in teaching it following the usual textbooks revealed great difficulties for students to acquire the desired competence. The result was a poor performance both in understanding the subject and in correctly answering questions during examinations.

For example, using only the expressions in (13) to compute type I and II errors (for one mean) increased markedly the quality of examination answers:

What used to be one of the difficult questions in examinations became a trivial one.

Now we are planning to take a further step forward, i.e., moving towards the effect size approach for the calculation of power with other distributions besides the normal. We hope it will be a simple task for our students, not only because of the unified framework we use, but also because they are already accustomed to using Excel in this context. The introduction of the effect size approach to SPA leads to the revision of the formulas presented in (13) to

7. Conclusion and Further Research

This is a first article in a series that intends to spread statistical teaching under the statistical power and precision analysis perspective. To attain this objective we intend: (1) to develop simple tools to deal with these subjects, integrated in a theoretical framework explained in a way that makes it accessible to textbook level readers, and (2) to analyze the current structure of courses in statistics for business and economics in order to suggest changes in their structures so that they accommodate better to this new perspective.

The present article is confined to statistical power analysis and aims to be a bridge between the usual way – which can only be used within testing using the normal distribution – and Cohen’s framework for statistical power analysis. This first step does not leave familiar territory, while showing a new way to act that has the advantage of being able to deal with more general situations, namely, power with non-central distributions. That will be the object of forthcoming articles, where we will present both ready to use tools and underlying concepts.

Appendix C – Cohen-Like Tables (2)

References

American Psychological Association (1994), Publication Manual of the American Psychological Association, 4^th ed., Washington, DC: Author.

American Psychological Association (2001), Publication Manual of the American Psychological Association, 5^th ed., Washington, DC: Author.

Berenson, M. L. and D. M. Levine (1996), Basic Business Statistics, 6th ed., Englewood Cliffs, New Jersey: Prentice Hall International Inc.

Berenson, M. L., D. M. Levine and T. C. Krehbiel (2004), Basic Business Statistics, 9^th ed., Upper Saddle River, New Jersey: Pearson/Prentice Hall.

Bernstein, S. and R. Bernstein (1999), Schaum's Outline Elements of Statistics II: Inferential Statistics, New York: McGraw-Hill.

Cohen, J. (1988), Statistical Power Analysis for the Behavioral Sciences, 2^nd Ed., NewYork: Lawrence Erlbaum Associates.

Freund, J. E. (1962), Mathematical Statistics, Englewood Cliffs, New Jersey: Prentice Hall International Inc.

Kazmier, L. J. and N. F. Poll (1979), Basic Statistics for Business and Economics, 2nd ed., New York: McGraw-Hill.

Kline, R. B.(2004). Beyond Significance Testing: Reforming Data Analysis Methods in Behavioral Research, American Psychological Association.

Knüsel, L. (1998), "On the Accuracy of the Statistical Distributions in Microsoft Excel 97", Comput. Statist. Data Anal. 39, pp. 109-110 (With Corrigendum in CSDA, 40, 385).

Knüsel, L. (2002), "On the Reliability of Microsoft Excel XP for Statistical Purposes", Comput. Statist. Data Anal. 26, pp. 375-377.

Knüsel, L. (2005), On the Accuracy of the Statistical Distributions in Microsoft Excel 2003, Comput. Statist. Data Anal. 48, pp. 445-449.

Larson, R. and E. Faber (2000), Elementary Statistics: Picturing the World, Pearson Education.

Levin, R. I. and D. S. Rubin (1998), Statistics for Management, 7th ed., Upper saddle River, New Jersey: Prentice Hall International Inc.

Levine D. M., M. L. Berenson and David Stephan (1999), Statistics for Managers Using Microsoft Excel, 2nd ed., Upper saddle River, New Jersey: Prentice Hall International Inc.

Microsoft Corporation (2007), "Description of the effects of the improved statistical functions for the Analysis ToolPak in Excel", http://support.microsoft.com/kb/829208/en-us (accessed on on 2008-March-4).

Microsoft Corporation (2007a), "Floating-point arithmetic may give inaccurate results", http://support.microsoft.com/kb/78113 (accessed on 2008-March-4).

Microsoft Corporation (2007b), "How to correct rounding errors in floating-point arithmetic", http://support.microsoft.com/kb/214118/en-us (accessed on 2008-March-4).

Murteira, B., C. S. Ribeiro, J. A. Silva and C. Pimenta (2002), Introdução à Estatística, Lisboa: McGraw-Hill.

Pestana, D.D. and S. F. Velosa (2002), Introdução à Probabilidade e Estatística, Lisboa: Fundação Calouste Gulbenkian.

Pinto, J. C. C. and J. J. Dias Curto (1999), Estatística para Economia e Gestão: Instrumentos de Apoio à Tomada de Decisão, Lisboa: Sílabo.

Reis E., P. Melo, R. Andrade and T. Calapez (2001), Estatística Aplicada – Vol. 2, 4th ed., Lisboa: Sílabo.

Sanders D. H. (1990), Statistics: A First Course, 5^th ed., New York: McGraw-Hill.

Sandy, R. (1990), Statistics for Business and Economics, New York: McGraw-Hill.

Watson, C. J., P. Billingsley, D.J. Croft and D.Huntsberger (1993), Statistics for Management and Economics, 5th ed., Boston: Allyn and Bacon.

Webster, A.L. (1998). Applied Statistics for Business and Economics, 3^rd ed. Boston: Irwin/McGrwaw-Hill.

Trade Marks and Intellectual Property

Microsoft® and Excel® are registered trademarks of Microsoft Corporation in the United States and in other countries.

SPSS^®and Sample Size® are registered trademarks of SPSS Inc. in the United States and in other countries.

António Teixeira
UNIDE researcher
IBS – ISCTE Business School
Lisbon, Portugal
Phone: 351217903201
afst@iscte.pt

Álvaro Rosa
UNIDE researcher
IBS – ISCTE Business School
Lisbon, Portugal
Phone: 351217903201
alvaro.rosa@iscte.pt

Teresa Calapez
UNIDE researcher
IBS – ISCTE Business School
Lisbon, Portugal
Phone: 351217903232
teresa.calapez@iscte.pt

a	Probability of incurring in a type I error; statistical significance of the test
β	Probability of incurring in a type II error
δ	Non centrality parameter
μ_a	Population A mean
μ_b	Population B mean
μ₀	Value for the mean in H₀
μ_p	Value for the mean in H_p
π	Power of the test
σ	Standard deviation of a population
σ_a	Standard deviation of population A
σ_b	Standard deviation of population B
	Standard error of the mean for samples taken from population A
	Standard error of the mean for samples taken from population B
	Standard error of the difference between the two means
d	Effect size; Cohen’s d
d_c	Sampled d values necessary for significance
H₀	Null hypothesis
H₁	Alternative hypothesis
H_p	Alternative value of the population mean to compute power
lcv	Lower critical value
ucv	Upper critical value
n	Sample size
n_a	Size of the sample taken from population A
n_b	Size of the sample taken from population B
n_λ	Size of the sample determined for precision analysis
n_π	Size of the sample determined for power analysis
R	Rejection interval
	Non rejection interval
	Sample mean
	Mean of the sample extracted from population A
	Mean of the sample extracted from population B
z	Standardized value of a variable
	Standardized value of the sample mean
	Standardized value of the difference between the sample means

Statistical Power Analysis with Microsoft Excel: Normal Tests for One or Two Means as a Prelude to Using Non-Central Distributions to Calculate Power

Abstract

1. Introduction

2. Excel Advantages, Shortcomings, and Oddities

3. Statistical Power Analysis with the Normal Distribution Using Microsoft Excel

3.1 Computation of Power Using the Usual Way

3.1.1. Power Computation a posteriori

3.1.2. Power Computation a priori

3.2 Computation of Power Using the Standardized Value

3.2.1. Power Computation a posteriori

3.2.2. Power Computation a priori

3.3 Computation of Power Using the Effect Size

3.3.1. Power Computation a posteriori

3.3.2. Power Computation a priori

4. How Cohen Deals With the Other Cases

4.1 A Posteriori Power

4.2 A Priori Power