论文信息 - The 'K' in K-fold Cross Validation

The 'K' in K-fold Cross Validation

The K-fold Cross Validation (KCV) technique is one of the most used approaches by practitioners for model selection and error es- timation of classifiers. The KCV consists in splitting a dataset into k subsets; then, iteratively, some of them are used to learn the model, while the others are exploited to assess its performance. However, in spite of the KCV success, only practical rule-of-thumb methods exist to choose the number and the cardinality of the subsets. We propose here an ap- proach, which allows to tune the number of the subsets of the KCV in a data-dependent way, so to obtain a reliable, tight and rigorous estimation of the probability of misclassification of the chosen model.

[1] Gunnar Rätsch,et al. Soft Margins for AdaBoost , 2001, Machine Learning.

[2] Isabelle Guyon,et al. Model Selection: Beyond the Bayesian/Frequentist Divide , 2010, J. Mach. Learn. Res..

[3] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[4] Chih-Jen Lin,et al. A Practical Guide to Support Vector Classication , 2008 .

[5] Sayan Mukherjee,et al. Choosing Multiple Parameters for Support Vector Machines , 2002, Machine Learning.

[6] Davide Anguita,et al. K-Fold Cross Validation for Error Rate Estimate in Support Vector Machines , 2009, DMIN.

[7] Ashutosh Kumar Singh,et al. The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2010 .

[8] Ron Kohavi,et al. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[9] Marcos M. Campos,et al. SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines , 2005, VLDB.

[10] Davide Anguita,et al. A Survey of old and New Results for the Test Error Estimation of a Classifier , 2013, J. Artif. Intell. Soft Comput. Res..

[11] E. S. Pearson,et al. THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[12] Thomas G. Dietterich. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[13] Sylvain Arlot,et al. A survey of cross-validation procedures for model selection , 2009, 0907.4728.