Fast Cross-Validation

Cross-validation (CV) is the most widely adopted approach for selecting the optimal model. However, the computation of CV  has high complexity due to multiple times of learner training, making it disabled for large scale model selection. In this paper, we present an approximate approach to CV based on the theoretical notion of Bouligand influence function (BIF) and the Nystr\"{o}m method for kernel methods. We first establish the relationship between the theoretical notion of BIF and CV, and propose a method to approximate the CV via the Taylor expansion of BIF. Then, we provide a novel computing method to calculate the BIF for general distribution, and evaluate BIF for sample distribution. Finally, we use the Nystr\"{o}m method to accelerate the computation of the BIF matrix for giving the finally approximate CV criterion. The proposed approximate CV requires training only once and is suitable for a wide variety of kernel methods. Experimental results on lots of datasets how that our approximate CV has no statistical discrepancy with the original CV, but can significantly improve the efficiency.

[1]  D. Ruppert Robust Statistics: The Approach Based on Influence Functions , 1987 .

[2]  Stephen M. Robinson,et al.  An Implicit-Function Theorem for a Class of Nonsmooth Functions , 1991, Math. Oper. Res..

[3]  V. Vapnik,et al.  Bounds on Error Expectation for Support Vector Machines , 2000, Neural Computation.

[4]  Sayan Mukherjee,et al.  Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[5]  Lorenzo Rosasco,et al.  Some Properties of Regularized Kernel Methods , 2004, J. Mach. Learn. Res..

[6]  Gavin C. Cawley,et al.  Leave-One-Out Cross-Validation Based Model Selection Criteria for Weighted LS-SVMs , 2006, The 2006 IEEE International Joint Conference on Neural Network Proceedings.

[7]  Gavin C. Cawley,et al.  Preventing Over-Fitting during Model Selection via Bayesian Regularisation of the Hyper-Parameters , 2007, J. Mach. Learn. Res..

[8]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[9]  Andreas Christmann,et al.  Bouligand Derivatives and Robustness of Support Vector Machines for Regression , 2007, J. Mach. Learn. Res..

[10]  G. Cawley,et al.  Efficient approximate leave-one-out cross-validation for kernel logistic regression , 2008, Machine Learning.

[11]  M. Debruyne,et al.  Model Selection in Kernel Based Regression using the Influence Function , 2008 .

[12]  Shizhong Liao,et al.  Approximate Model Selection for Large Scale LSSVM , 2011, ACML.

[13]  Shizhong Liao,et al.  Nyström Approximate Model Selection for LSSVM , 2012, PAKDD.

[14]  Yong Liu,et al.  Eigenvalues perturbation of integral operator for kernel selection , 2013, CIKM.

[15]  Yong Liu,et al.  Efficient Approximation of Cross-Validation for Kernel Methods using Bouligand Influence Function , 2014, ICML.

[16]  Shizhong Liao,et al.  Model Selection with the Covering Number of the Ball of RKHS , 2014, CIKM.

[17]  Yong Liu,et al.  Preventing Over-Fitting of Cross-Validation with Kernel Stability , 2014, ECML/PKDD.

[18]  Shizhong Liao,et al.  Approximate Consistency: Towards Foundations of Approximate Kernel Selection , 2014, ECML/PKDD.

[19]  Yong Liu,et al.  Eigenvalues Ratio for Kernel Selection of Kernel Methods , 2015, AAAI.

[20]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[21]  Jian Li,et al.  Efficient Kernel Selection via Spectral Analysis , 2017, IJCAI.

[22]  Shizhong Liao,et al.  An Approximate Approach to Automatic Kernel Selection , 2017, IEEE Transactions on Cybernetics.

[23]  Yong Liu,et al.  Infinite Kernel Learning: Generalization Bounds and Algorithms , 2017, AAAI.