Shai Linn
School of Public Health, Haifa University
and Rambam Medical Center, Haifa, Israel
Journal of Statistics Education Volume 12, Number 3 (2004), jse.amstat.org/v12n3/linn.html
Copyright © 2004 by Shai Linn, all rights reserved. This text may be freely shared among individuals, but it may not be republished in any medium without express written consent from the authors and advance notification of the editor.
Key Words:Bayes' Theorem; Diagnosis; Predictive value
However, we typically do not have the information on this population because it is often unfeasible and unethical to perform both the diagnostic tests and an additional definitive test to determine the true diagnosis according to the gold standard (Sackett and Haynes, 2002). For example, using angiography as a gold standard for diagnosing cardiac ischemia by electrocardiographic changes is “not a very attractive alternative in terms of discomfort, risk and cost” (Sackett, Haynes, Guyatt, and Tugwell 1991, p. 101). Therefore, the PPV and NPV are calculated from the sensitivity and specificity and the prevalence of the disease in the target population, using Bayes’ Theorem. Thus, a presentation in one table (Table 1) for analyses in two populations may be pedagogically misleading. A new approach, using two tables (Table 2 and Table 3) instead of one table (Table 1) and specific notations for each table, is hereby proposed.
Gold Standard | ||||
S+ | S- | Total | ||
Clinical Test | T+ | a=True Positive | b=False Positive | a+b |
T- | c=False Negative | d=True Negative | c+d | |
Total | a+c | b+d |
Note: The table demonstrates a misleading presentation in that all test characteristics are calculated in one table.
Sensitivity = a/a+c
Specificity = d/b+d
Positive Predictive Value (PPV) = a/a+b
Negative Predictive Value (NPV) = d/c+d
This may be misleading to many students for the following reasons:
Finally, using one table to teach diagnostic test characteristics often makes the definitions of rates unclear (Riffenburgh 1993; Weinstein and Finberg 1980; Hirsch and Riegelman 1996) . Is the “true positive rate” referring to the sensitivity (as often defined)? Or is it referring to the predictive value (as often understood by students, physicians or patients)? Analogous considerations are valid for true negative rates, false positive rates and false negative rates. Because of this confusion, Hirsch and Riegelman (1996, pp 11-12) recommended not using these terms at all. We offer a way to overcome these difficulties by presenting the analyses in two tables.
We use S to denote sickness rather than D which could have been used as an acronym for diseased, because of the use of the letter D in the tables.
Gold Standard | ||||
S+ | S- | |||
Clinical Test | T+ | a=True Positive | b=False Positive | |
T- | c=False Negative | d=True Negative | ||
Total | a+c | b+d |
Note: The table demonstrates a more appropriate presentation for the study (selected) population.
Sensitivity = a/a+c
Specificity = d/b+d
fpr = b/b+d
fnr = c/a+c
A second table with uppercase notations (Table 3) is used to explain the predictive values among the target population in which the test would be applied for screening or clinical diagnosis. Sampling of this population is done horizontally, i.e., those with positive and negative tests. Thus, it is appropriate to have A+B and C+D as totals in this table if a physician monitors the success of the clinical test by ascertaining the disease status (the gold standard status) of persons with positive and/or negative tests results. However, it is inappropriate to have totals of the “vertical” axis of test results. Thus, we define the Positive Predictive Value (PPV) and the Negative Predictive Value (NPV) as follows:
Gold Standard | ||||
S+ | S- | Total | ||
Clinical Test | T+ | A=True Positive | B=False Positive | A+B |
T- | C=False Negative | D=True Negative | C+D |
Note: The table demonstrates a more appropriate presentation for the patient target population.
Positive Predictive Value (PPV) = A/A+B
Negative Predictive Value (NPV) = D/C+D
False Positive Rate (FPR) = B/A+B
False Negative Rate (FNR) = C/C+D
It is now obvious that the translation of information on sensitivity and specificity to PPV or NPV must be done by using Bayes’ Theorem and the prevalence P(S+).
Positive Predictive Value, PPV
Similarly, Negative Predictive Value, NPV
When the diseased and non-diseased are sampled, in a case control study, the definitions are:
False positive rate among persons without the disease is
i.e., fpr=1-specificity
and
False negative rate among persons with the disease is
i.e., fnr=1-sensitivity.
These definitions of the fpr and fnr, which are based on Table 2, appear in most of the above-mentioned textbooks.
Following Fleiss (1981), we can define these measures of interest in the general
patient population (Table 3), using uppercase notations:
False positive rate among persons with a positive test is
i.e., FPR=1-PPV.
This statistic indicates the rate of non-diseased persons who would erroneously be classified as having the disease by the clinical diagnostic test.
Clearly, using Bayes’ Theorem:
Similarly,
False negative rate among persons with a negative test is
ie., FNR=1-NPV.
This statistic indicates the rate of diseased persons who would erroneously be classified as not having th disease by the clinical diagnostic test.
Clearly, using Bayes' Theorem
Thus, the two-table presentation enables clear pedagogical distinction of the definitions of error rates in the two different populations, the selected case-control study population (Table 2), i.e., fpr and fnr, and the target population (Table 3), i.e., FPR and FNR.
Final dagnosis by pathology, the Gold Standard | ||||
Skin cancer S+ | No skin cancer S- | |||
Clinical Test | Diagnosis of skin cancer T+ | 63 | 6 | |
No diagnosis of skin cancer T- | 10 | 112 | ||
Total | 73 | 118 |
The data for this study indicate a sensitivity of 86.3%, a specificity of 94.9%, a fpr of 5.1% and a fnr of 13.7%. However, PPV, or NPV and the error rates in the general population cannot be calculated from Table 4. Such erroneous estimates would apply to the physician study population alone, and would yield uninformative (and misleading) PPV of 91.3%, NPV of 91.8%, FPR of 8.7% and FNR of 8.2%. Such a single-table presentation would be misleading, because it is incorrect to calculate the PPV and NPV of clinical examinations in the general population from these data. Rather, based on the sensitivity and specificity, a national prevalence of skin cancer of, say, 0.08%, and Bayes’ Theorem, the calculated PPV would be approximately 13.407%, quite different from the PPV for the physician in a dermatology clinic. This discrepancy occurs because of the low prevalence of the disease in the general population. Similar calculations would yield NPV of 99.98845%, FPR of 98.6593% and FNR of 0.001096%. The data for the general population could be reconstructed by first determining the margins according to the prevalence, i.e., 8 patients with melanoma for 10000 persons in the general population. Then, the sensitivity and specificity can be used to yield Table 5, which is the correct presentation for the general population (because of rounding to integers in constructing the table, direct calculations from Table 5 would yield estimates slightly different from the above calculations, based on Bayes’ Theorem).
Final diagnosis by pathology, the Gold Standard | ||||
Skin cancer S+ | No skin cancer S- | Total | ||
Clinical Test | Diagnosis of skin cancer T+ | 7 | 510 | 517 |
No diagnosis of skin cancer T- | 1 | 9482 | 9483 | |
Calculated margins | 8 | 9992 | 10000 |
The prevalence is 0.08%, thus we expect 8 patients (a rounded number) with melanoma and 9992 healthy persons in 10000 persons.
Using a sensitivity of 86.3%, we calculate A=7 (0.863*8).
Using a specificity of 94.9%, we calculate D=9482 (0.949*9992).
As has been mentioned above, most textbooks present both the sensitivity and specificity or the PPV or NPV in a single table. Moreover, some would prefer, pedagogically, to begin with a simpler one 2X2 table and then proceed on to a more conceptually correct - but perhaps more complex - two 2X2 table presentation. It is suggested using a two-table presentation for advanced students, or including a transition from a one-table to a two-table presentation even if one begins teaching using a simple one table. Eventually, using two tables to describe diagnostic test characteristics is, in our experience, pedagogically and conceptually more acceptable to students.
Using the two tables and the derived equations demonstrates clearly the use of Bayes’ Theorem, test characteristics (the sensitivity and specificity) and the prevalence to calculate PPV. It is more obvious that the analyses are done in two stages, for two different populations: the selected study population and the target population. This approach makes it easier to discuss and define two different types of false negative rates and false positive rates in the two populations.
P(T-) = probability of the diagnostic test being negative
P(S+) = probability of the disease, i.e., the prevalence
P(S-) = probability of no disease, i.e., 1-prevalence
vertical line ( | ) stands for "given that"
Altman D.G. (1991), Practical statistics for medical research, Chapman & Hall London.
Baron, J.A. (2001), "Clinical epidemiology," in Teaching Epidemiology eds. Olsen J., Saracci R., and Trichopoulos D., Oxford: Oxford University Press, pp. 237-249.
Beaglehole, R., Bonita, R., and Kjellstrom, T. (1993), Basic Epidemiology, Geneva: World Health Organization.
Bhopal, R. (2002), Concepts of Epidemiology, Oxford: Oxford University Press.
Bradley, G.W. (1993), Disease Diagnosis and Decision, New York: John Wiley & Sons.
Dawson, B., and Trapp, R.G. (1994), Basic and Clinical Biostatistics, New York: Lange–McGraw-Hill.
Dawson, B., and Trapp, R.G. (2001), Basic and Clinical Biostatistics, New York: Lange Medical Books-McGraw Hill.
Greenberg, R.S., Daniels, S.R., Flanders, W.D., Eley, J.W., and Boring, J.R. (2001), Medical Epidemiology, London: Lange-McGraw-Hill.
Essex-Sorlie, D. (1995), Medical Biostatistics and Epidemiology, New York: Appleton & Lange/McGraw Hill.
Fleiss, J.L. (1981), Statistical Methods for Rates and Proportions (2nd ed.), New York: John Wiley & Sons.
Hirsch, R.P., and Riegelman R.K. (1996), Statistical Operations, Oxford: Blackwell Science.
Jenicek, M. (1995), The Logic of Modern Medicine, Montreal: EPIDEM International.
Kraemer, H.C. (1992), Evaluation of Medical Tests: Objective and quantitative guidelines, London: Sage Publications.
Pepe, M. S. (2003), The Statistical Evaluation of Medical Tests for Classification and Prediction, Oxford Statistical Science Series 28, Oxford: Oxford University Press.
Riegelman, R.K. (2000), Studying a Study and Testing a Test, Philadelphia: Lippincott Williams & Wilkins.
Riffenburgh, R.H. (1993), Statistics in Medicine, San Diego: Academic Press.
Sackett, D.L., Haynes, R.B., Guyatt, G.H., and Tugwell, P. (1991), Clinical Epidemiology (2nd ed.), Boston: Little Brown & Company.
Sackett, D., and Haynes, R.B. (2002), "The Architecture of Diagnostic Research," in The Evidence Base of Clinical Diagnosis. ed. J.A. Knottnerus, London: BMJ Publishing.
Silva, S.I. (1999), Cancer Epidemiology: Principles and Methods. Geneva: International Agency for Research on Cancer, World Health Organization.
Sox, H.C., Blatt, M.A., Higgins, M.C., and Marton K.I. (1988), Medical Decision Making, Boston: Butterworth-Heinemann.
Wassertheil, S. (1995), Biostatistics and Epidemiology, New York: Springer-Verlag.
Weinstein, M.C., and Finberg, H.V. (1980), Clinical Decision Analysis, Philadelphia: W.B. Saunders Co.
Weiss, N.S. (1996), Clinical Epidemiology, Oxford: Oxford University Press.
Shai Linn
School of Public Health
Faculty of Welfare and Health Studies
Haifa Univeristy
and Unit of Clinical Epidemiology,
Rambam Medical Center
Haifa
Israel
slinn@univ.haifa.ac.il
Volume 12 (2004) | Archive | Index | Data Archive | Information Service | Editorial Board | Guidelines for Authors | Guidelines for Data Contributors | Home Page | Contact JSE | ASA Publications