Approximate Cross-validation: Guarantees for Model Assessment and Selection

Cross-validation (CV) is a popular approach for assessing and selecting predictive models. However, when the number of folds is large, CV suffers from a need to repeatedly refit a learning procedure on a large number of training datasets. Recent work in empirical risk minimization (ERM) approximates the expensive refitting with a single Newton step warm-started from the full training set optimizer. While this can greatly reduce runtime, several open questions remain including whether these approximations lead to faithful model selection and whether they are suitable for non-smooth objectives. We address these questions with three main contributions: (i) we provide uniform non-asymptotic, deterministic model assessment guarantees for approximate CV; (ii) we show that (roughly) the same conditions also guarantee model selection performance comparable to CV; (iii) we provide a proximal Newton extension of the approximate CV framework for non-smooth prediction problems and develop improved assessment guarantees for problems such as l1-regularized ERM.

[1]  M. Stone Cross‐Validatory Choice and Assessment of Statistical Predictions , 1976 .

[2]  Seymour Geisser,et al.  The Predictive Sample Reuse Method with Applications , 1975 .

[3]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[4]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[5]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[6]  Stephen P. Boyd,et al.  A rank minimization heuristic with application to minimum order system approximation , 2001, Proceedings of the 2001 American Control Conference. (Cat. No.01CH37148).

[7]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[8]  Yurii Nesterov,et al.  Accelerating the cubic regularization of Newton’s method on convex problems , 2005, Math. Program..

[9]  M. Debruyne,et al.  Model Selection in Kernel Based Regression using the Influence Function , 2008 .

[10]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[11]  Michael I. Jordan,et al.  Improved Automated Seismic Event Extraction Using Machine Learning , 2009 .

[12]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[13]  Lu Li,et al.  An inexact interior point method for L1-regularized sparse covariance selection , 2010, Math. Program. Comput..

[14]  Tapio Salakoski,et al.  A comparison of AUC estimators in small-sample studies , 2009, MLSB.

[15]  Tapio Salakoski,et al.  An experimental comparison of cross-validation techniques for estimating the area under the ROC curve , 2011, Comput. Stat. Data Anal..

[16]  Lester W. Mackey Dividing, Conquering, and Mixing Matrix Factorizations , 2013 .

[17]  Weijie J. Su,et al.  Statistical estimation and testing via the sorted L1 norm , 2013, 1310.1969.

[18]  Yong Liu,et al.  Efficient Approximation of Cross-Validation for Kernel Methods using Bouligand Influence Function , 2014, ICML.

[19]  Pradeep Ravikumar,et al.  QUIC: quadratic approximation for sparse inverse covariance estimation , 2014, J. Mach. Learn. Res..

[20]  Michael A. Saunders,et al.  Proximal Newton-Type Methods for Minimizing Composite Functions , 2012, SIAM J. Optim..

[21]  Marco Andrés,et al.  Influence functions for penalized M-estimators , 2014 .

[22]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[23]  Yoshiyuki Kabashima,et al.  Cross validation in LASSO and its acceleration , 2015, ArXiv.

[24]  Bruce W. Suter,et al.  From error bounds to the complexity of first-order descent methods for convex functions , 2015, Math. Program..

[25]  Vahid Tarokh,et al.  On Optimal Generalizability in Parametric Learning , 2017, NIPS.

[26]  Accelerating cross-validation in multinomial logistic regression with l1-regularization , 2018 .

[27]  Weiping Wang,et al.  Fast Cross-Validation , 2018, IJCAI.

[28]  A multifactorial model of T cell expansion and durable clinical benefit in response to a PD-L1 inhibitor , 2018, PloS one.

[29]  Kamiar Rahnama Rad,et al.  A scalable estimate of the extra-sample prediction error via approximate leave-one-out , 2018, 1801.10243.

[30]  Vahab S. Mirrokni,et al.  Approximate Leave-One-Out for Fast Parameter Tuning in High Dimensions , 2018, ICML.

[31]  Michael I. Jordan,et al.  A Swiss Army Infinitesimal Jackknife , 2018, AISTATS.

[32]  William T. Stephenson,et al.  Sparse Approximate Cross-Validation for High-Dimensional GLMs , 2019, ArXiv.

[33]  Michael I. Jordan,et al.  A Higher-Order Swiss Army Infinitesimal Jackknife , 2019, ArXiv.

[34]  William T. Stephenson,et al.  Approximate Cross-Validation in High Dimensions with Guarantees , 2019, AISTATS.

[35]  Yurii Nesterov,et al.  Implementable tensor methods in unconstrained convex optimization , 2019, Mathematical Programming.