论文信息 - Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates. - 字舞流文

Computationally efficient confidence intervals for cross-validated area under the ROC curve estimates.

In binary classification problems, the area under the ROC curve (AUC) is commonly used to evaluate the performance of a prediction model. Often, it is combined with cross-validation in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we obtain an estimate of its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, the process of cross-validating a predictive model on even a relatively small data set can still require a large amount of computation time. Thus, in many practical settings, the bootstrap is a computationally intractable approach to variance estimation. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC.

Maya Petersen | Erin LeDell | Mark van der Laan | M. J. van der Laan | M. Petersen | E. LeDell | M. Petersen

[1] Christina Gloeckner,et al. Modern Applied Statistics With S , 2003 .

[2] Seymour Geisser,et al. The Predictive Sample Reuse Method with Applications , 1975 .

[3] R. Gill. Non- and semi-parametric maximum likelihood estimators and the Von Mises method , 1986 .

[4] M. J. Laan,et al. Targeted Learning: Causal Inference for Observational and Experimental Data , 2011 .

[5] Andrew P. Bradley,et al. The use of the area under the ROC curve in the evaluation of machine learning algorithms , 1997, Pattern Recognit..

[6] Robert Tibshirani,et al. An Introduction to the Bootstrap , 1994 .

[7] David M. Allen,et al. The Relationship Between Variable Selection and Data Agumentation and a Method for Prediction , 1974 .

[8] Thomas Lengauer,et al. ROCR: visualizing classifier performance in R , 2005, Bioinform..

[9] Alan Edelman,et al. Julia: A Fast Dynamic Language for Technical Computing , 2012, ArXiv.

[10] J. Shao. Linear Model Selection by Cross-validation , 1993 .

[11] Purnamrita Sarkar,et al. A scalable bootstrap for massive data , 2011, 1112.5016.

[12] M. Stone. Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[13] Brian D. Ripley,et al. Modern Applied Statistics with S Fourth edition , 2002 .

[14] K. Do,et al. Efficient and Adaptive Estimation for Semiparametric Models. , 1994 .

[15] Susan A. Murphy,et al. Monographs on statistics and applied probability , 1990 .

[16] F. Götze,et al. RESAMPLING FEWER THAN n OBSERVATIONS: GAINS, LOSSES, AND REMEDIES FOR LOSSES , 2012 .

[17] Trevor Hastie,et al. Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[18] Jon A. Wellner,et al. Weak Convergence and Empirical Processes: With Applications to Statistics , 1996 .

[19] Mark J van der Laan,et al. Targeted Maximum Likelihood Estimation of Natural Direct Effects , 2012, The international journal of biostatistics.

[20] David Hinkley,et al. Bootstrap Methods: Another Look at the Jackknife , 2008 .