Choosing Between Two Classification Learning Algorithms Based on Calibrated Balanced $$5\times 2$$5×2 Cross-Validated F-Test

Abstract$$5\times 2$$5×2 cross-validated F-test based on independent five replications of 2-fold cross-validation is recommended in choosing between two classification learning algorithms. However, the reusing of the same data in a $$5\times 2$$5×2 cross-validation causes the real degree of freedom (DOF) of the test to be lower than the F(10, 5) distribution given by (Neural Comput 11:1885–1892, [1]). This easily leads the test to suffer from high type I and type II errors. Random partitions for $$5\times 2$$5×2 cross-validation result in difficulty in analyzing the DOF for the test. In particular, Wang et al. (Neural Comput 26(1):208–235, [2]) proposed a new blocked $$3 \times 2$$3×2 cross-validation, that considered the correlation between any two 2-fold cross-validations. Based on this, a calibrated balanced $$5\times 2$$5×2 cross-validated F-test following F(7, 5) distribution is put forward in this study by calibrating the DOF for the F(10, 5) distribution. Simulated and real data studies demonstrate that the calibrated balanced $$5\times 2$$5×2 cross-validated F-test has lower type I and type II errors than the $$5\times 2$$5×2 cross-validated F-test following F(10, 5) in most cases.

[1]  Olcay Taner Yildiz,et al.  Omnivariate Rule Induction Using a Novel Pairwise Statistical Test , 2013, IEEE Transactions on Knowledge and Data Engineering.

[2]  Ethem Alpaydın,et al.  Combined 5 x 2 cv F Test for Comparing Supervised Classification Learning Algorithms , 1999, Neural Comput..

[3]  Ethem Alpaydin,et al.  Ordering and finding the best of K > 2 supervised learning algorithms , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  AlpaydinEthem,et al.  Cost-Conscious Comparison of Supervised Learning Algorithms over Multiple Data Sets , 2008 .

[5]  Satterthwaite Fe An approximate distribution of estimates of variance components. , 1946 .

[6]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[7]  Yu Wang,et al.  Measure for data partitioning in m × 2 cross-validation , 2015, Pattern Recognit. Lett..

[8]  Wang Yu,et al.  Blocked 3×2 Cross-Validated t-Test for Comparing Supervised Classification Learning Algorithms , 2014, Neural Comput..

[9]  George Hripcsak,et al.  Analysis of Variance of Cross-Validation Estimators of the Generalization Error , 2005, J. Mach. Learn. Res..

[10]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[11]  Vijayan N. Nair,et al.  Methods for Identifying Dispersion Effects in Unreplicated Factorial Experiments , 2001, Technometrics.

[12]  Yoshua Bengio,et al.  No Unbiased Estimator of the Variance of K-Fold Cross-Validation , 2003, J. Mach. Learn. Res..

[13]  Eibe Frank,et al.  Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms , 2004, PAKDD.

[14]  Yves Grandvalet Hypothesis Testing for Cross-Validation , .

[15]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .

[16]  Weijie Chen,et al.  Classifier variability: Accounting for training and testing , 2012, Pattern Recognit..

[17]  Remco R. Bouckaert,et al.  Choosing Between Two Learning Algorithms Based on Calibrated Tests , 2003, ICML.

[18]  Yoshua Bengio,et al.  Inference for the Generalization Error , 1999, Machine Learning.