论文信息 - K-Fold Generalization Capability Assessment for

K-Fold Generalization Capability Assessment for

Theproblem ofhowtoeffectively implement k-fold cross-validation forSupport Vector Machines ishereconsidered. Indeed, despite thefact that this selection criterion iswidely used duetoitsreasonable requirements intermsofcomputational resources anditsgoodability inidentifying awellperforming model, itisnotclear howoneshould employ thecommittee ofclassifiers comingfromthek folds forthetaskofon-ine classification. Threemethods areheredescribed andtested, based respectively on:averaging, randomchoice andmajority voting. Eachofthese methods istested onawiderange ofdata-sets for different fold settings. I.INTRODUCTION K-fold Cross-Validation (KCV)isoneofthemostadopted criteria forassessing theperformance ofa modelandfor selecting anhypothesis within aclass. Anadvantage ofthis method, overthesimple training-test datasplitting, isthe repeated useofthewholeavailable dataforbothbuilding a learning machine andfortesting it, thusreducing therisk of (un)lucky splitting. Despite thefact that KCVdoesnothaveadedicated theory whichguarantees ad-hoc bounds onthegeneralization error andthat thevariance oftheestimated trueerror ishardto assess (1), this method iscurrently widely usedinmanyfields fordifferent problem types andperforms like thebest available selection criteria while requiring amoderate computational overhead (6).

Davide Anguita | Sandro Ridella

[1] Vladimir Vapnik,et al. Statistical learning theory , 1998 .

[2] Gunnar Rätsch,et al. Soft Margins for AdaBoost , 2001, Machine Learning.

[3] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[4] S. Sathiya Keerthi,et al. Evaluation of simple performance measures for tuning SVM hyperparameters , 2003, Neurocomputing.

[5] John Langford,et al. Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.