ESTIMATION OF AUC AND ITS SIGNIFICANCE IN THE ASSESSMENT OF CLASSIFICATION MODELS

  • Okeh U.M

Abstract

The performance of a diagnostic test when test results are measured on a binary or ordinal scale can be evaluated using the measures of sensitivity and specificity. In particular, when it is measured on a continuous scale, the assessment of the performance of a diagnostic test is always over the range of possible cut-off points for the predictor variable. This is achieved by the use of a receiver operating characteristic (ROC) curve which is a graph of sensitivity against 1-specificity across all possible decision cut-offs values from a diagnostic test result. This curve evaluates the diagnostic ability of tests to discriminate the true state of subjects especially in classification models. These tasks of assessing the predictive accuracy of classification models is always better achieved using a summary measure of accuracy across all possible ranges of cut-off values called the area under the receiver operating characteristic curve (AUC). In this paper, we propose a simple nonparametric method of calculating AUC from predicted probability of positive response to a condition which involves multiple prediction rules. This method is based on the non-parametric Mann-Whitney U statistic. The estimation methods for AUC and their significance was assessed using some classification models. The proposed method when applied on real data and compared with other existing methods of calculating AUC was shown to be better in assessing classification models. The method offers reliable statistical inferences and circumvents the difficulties of deriving the statistical moments of complex summary statistics seen in the parametric method. The proposed method as a non-parametric estimation is recommended for calculating the AUC.    

References

Aoki, K., J. Misumi, T. Kimura, W. Zhao and T. Xie, 1997. Evaluation of cutoff levels for screening of gastric cancer using serum pepsinogens and distributions of levels of serum pepsinogens I, II and of PG I/PG II ratios in a gastric cancer case-control study. Journal of Epidemiology, 7, 143 – 151.
Bamber D. (1975). The area above the ordinal dominance graph and the area below the receiver operating characteristic graph. Journal of Mathematical Psychology 12, 387-415.
Buros, Amy and Tubbs, Jack D.(2013). Applying the Jonckheere-Terpstra Statistic to AUC Regression. Department of Statistical Science, Baylor University, Waco, TX 76706. www.louisville.edu/sphis/bb/srcos-2013/….
DeLong, E.R., DeLong, D.M., Clarke-Pearson, D.L. (1988). Comparing the area under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44(3), 837-845.
Dorfman DD, Alf JrE. Maximum likelihood estimation of parameters of signal detection theory and determination of confidence intervals – rating-method data. Journal of Mathematical Psychology 1969; 6: 487-496.
D. D. Dorfman and E. Alf, “Maximum likelihood estimation of parameters of signal detection theory: a direct solution,” Psychometrika, vol. 33, no. 1, pp. 117–124, 1968.
Egan, JP. (1975). Signal detection theory and ROC Analysis. New York: Academic Press.
Green, DM. and Swets, JA. (1966). Signal detection theory and psychophysics. John Wiley & Sons, Inc. New York.
Greiner, M., Pfeiffer, D., and Smith, RD. (2000). Principals and practical application of the Receiver Operating Characteristic analysis for diagnostic tests. Preventive Veterinary Medicine, 45:23–41.
J. A. Hanley and B. J. McNeil, “The meaning and use of the area under a receiver operating characteristic (ROC) curve,” Radiology, vol. 143, no. 1, pp. 29–36, 1982.
Hanley JA, McNeil BJ. A method of comparing the area under two ROC curves derived from the same cases. Radiology 1983; 148: 839-843.
Hsiao, J. K., J. J. Barko and W. Z. Potter, 1989. Diagnosing diagnoses: receiver operating characteristic methods and psychiatry. Archives of General Psychiatry, 46, 664 – 667.
Krzyśko M., Wołyński W., Górecki T., Skorzybut M. (2008) Systemy uczące się, Wydawnictwo Naukowo-Techniczne
Lasko, T. A., J. G. Bhagwat, K. H. Zou and L. Ohno-Machado, 2005. The Use of Receiver Operating Characteristic Curves in Biomedical Informatics. Journal of Biomedical Informatics, 38, 404 – 415.
L´opez-Rat´on, M., Cadarso-Su´arez, C., and Lado, MJ. (2012a). T´ecnicas de estimaci´on e inferencia de las curvas ROC. Editorial Acad´emica Espa˜ nola.
Mann, H.B & Whitney D.R (1947).On a test whether one of two random variables is stochastically larger than the other.Ann.Math.Statist;18,pp.50-60.
McClish, DK. (1989). Analyzing a portion of the ROC curve. Medical Decision Making, 9:190–195.
Metz CE. Basic Principles of ROC analysis. Seminars in Nuclear Medicine 1978; 8(4): 283-298.
Metz CE. ROC methodology in radiologic imaging. Investigative Radiology 1986; 21(9): 720-733.
Metz, C. E., 1989. Some practical issues of experimental design and data analysis in radiological ROC studies.Investigation Radiology, 24, 234 – 245.
Metzger BE, Buchanan TA, Coustan DR, de Leiva A, Dunger DB, Hadden DR, et al. Summary and recommendations of the fifth international workshop-conference on gestational diabetes mellitus. Diabetes Care 2007; 30: S 251.
Metz, CE., Herman, BA., and Shen, JH. (1998). Maximum likelihood estimation of Receiver Operating Characteristic (ROC) curves from continuously-distributed data. Statistics in Medicine, 17:1033–1053.
National Diabetes Data Group. Classification and diagnosis of diabetes mellitus and other categories of glucose intolerance. Diabetes 1979; 28: 1039–1057
Ogum, G. E. O. (2003). Introduction to methods of multivariate analysis. Afri Towers Ltd. Aba. Nigeria.
Okeh U.M and Oyeka I.C.A (2014) Two Factor Analysis of Variance and Dummy Variable Regression Models. Global Journal of Science Frontier Research: F Mathematics and Decision Sciences. Volume 14 Issue 6 Version 1.0. Online ISSN: 2249-4626 & Print ISSN: 0975-5896.
Okeh U.M and Oyeka I.C.A (2015) One Factor Analysis of Variance and Dummy Variable Regression Models. Global Journal of Science Frontier Research: F Mathematics and Decision Sciences. Volume 15 Issue 7 Version 1.0. Online ISSN: 2249-4626 & Print ISSN: 0975-5896.
Okeh UM and Oyeka ICA (2016). Dummy Variable Multiple Regression Analysis of Matched Samples. Biometrics & Biostatistics International Journal.Volume 3 Issue 5.
Onyeagu, S. I. (2003). A first Course in Multivariate analysis. MegaConcept, Nigeria.
Pepe, M.S. (2003). The statistical evaluation of medical test for classification and prediction. Oxford: Oxford University Press.
Pepe, MS. (2004). The statistical evaluation of medical tests for classification and prediction. 1st ed. Oxford University Press, USA.
P.K. Sen, On some convergence properties of U statistics, Calcutta Statist. Assoc. Bull. 10 (1960), pp. 1{18.
Swets, J. A. and R. M. Picket, 1982. Evaluation of diagnostic systems: methods from signal detection theory. Academic Press, New York.
WHO. Definition, Diagnosis and Classification of Diabetes Mellitus and its Complications. Report of a WHO consultation. Part 1: Diagnosis and Classification of Diabetes Mellitus. Geneva: Department of Non-communicable Disease Surveillance, World Health Organization, 1999.
Zou, KH., Hall, WJ., and Shapiro, DE. (1997). Smooth non-parametric Receiver Operating Characteristic (ROC) curves for continuous diagnostic tests. Statistics in Medicine, 16:2143–2156.
Zhou, X. H., N. A. Obuchowski and D. K. McClish, 2002. Statistical methods in diagnostic medicine. Wiley, New York.
Published
2017-04-22
Section
Mathematics