AUC-Maximizing Ensembles through Metalearning

Abstract Area Under the ROC Curve (AUC) is often used to measure the performance of an estimator in binary classification problems. An AUC-maximizing classifier can have significant advantages in cases where ranking correctness is valued or if the outcome is rare. In a Super Learner ensemble, maximization of the AUC can be achieved by the use of an AUC-maximining metalearning algorithm. We discuss an implementation of an AUC-maximization technique that is formulated as a nonlinear optimization problem. We also evaluate the effectiveness of a large number of different nonlinear optimization algorithms to maximize the cross-validated AUC of the ensemble fit. The results provide evidence that AUC-maximizing metalearners can, and often do, out-perform non-AUC-maximizing metalearning methods, with respect to ensemble AUC. The results also demonstrate that as the level of imbalance in the training data increases, the Super Learner ensemble outperforms the top base algorithm by a larger degree.

[1]  Beata Strack,et al.  Impact of HbA1c Measurement on Hospital Readmission Rates: Analysis of 70,000 Clinical Database Patient Records , 2014, BioMed research international.

[2]  P. Baldi,et al.  Searching for exotic particles in high-energy physics with deep learning , 2014, Nature Communications.

[3]  David J. Hand,et al.  Measuring classifier performance: a coherent alternative to the area under the ROC curve , 2009, Machine Learning.

[4]  M. J. van der Laan,et al.  Statistical Applications in Genetics and Molecular Biology Super Learner , 2010 .

[5]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[6]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[7]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[8]  øöö Blockinøø Well-Trained PETs : Improving Probability Estimation , 2000 .

[9]  Jorge Nocedal,et al.  Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization , 1997, TOMS.

[10]  Andrew P. Bradley,et al.  The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[11]  C. Metz,et al.  A receiver operating characteristic partial area index for highly sensitive diagnostic tests. , 1996, Radiology.

[12]  R. Tibshirani,et al.  Combining Estimates in Regression and Classification , 1996 .

[13]  Jorge Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[14]  Charles L. Lawson,et al.  Solving least squares problems , 1976, Classics in applied mathematics.

[15]  Carey E. Priebe,et al.  COMPARATIVE EVALUATION OF PATTERN RECOGNITION TECHNIQUES FOR DETECTION OF MICROCALCIFICATIONS IN MAMMOGRAPHY , 1993 .

[16]  Claude J. P. Bélisle Convergence theorems for a class of simulated annealing algorithms on ℝ d , 1992, Journal of Applied Probability.

[17]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[18]  D. McClish Analyzing a Portion of the ROC Curve , 1989, Medical decision making : an international journal of the Society for Medical Decision Making.

[19]  Larry Nazareth,et al.  A family of variable metric updates , 1977, Math. Program..

[20]  C. G. Broyden The Convergence of a Class of Double-rank Minimization Algorithms 2. The New Algorithm , 1970 .

[21]  D. Shanno Conditioning of Quasi-Newton Methods for Function Minimization , 1970 .

[22]  R. Fletcher,et al.  A New Approach to Variable Metric Algorithms , 1970, Comput. J..

[23]  Alexander Dekhtyar,et al.  Information Retrieval , 2018, Lecture Notes in Computer Science.

[24]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..