An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons

In a recently published paper in JMLR, Demˇ sar (2006) recommends a set of non-parametric statistical tests and procedures which can be safely used for comparing the performance of classifiers over multiple data sets. After studying the paper, we realize that the paper correctly introduces the basic procedures and some of the most advanced ones when comparing a control method. However, it does not deal with some advanced topics in depth. Regarding these topics, we focus on more powerful proposals of statistical procedures for comparing n n classifiers. Moreover, we illustrate an easy way of obtaining adjusted and comparable p-values in multiple comparison procedures.

[1]  M. Friedman The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance , 1937 .

[2]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[3]  R. Iman,et al.  Approximations of the critical region of the fbietkan statistic , 1980 .

[4]  J. Shaffer Modified Sequentially Rejective Multiple Test Procedures , 1986 .

[5]  R. Simes,et al.  An improved Bonferroni procedure for multiple tests of significance , 1986 .

[6]  Y. Hochberg A sharper Bonferroni procedure for multiple tests of significance , 1988 .

[7]  G. Hommel,et al.  Improvements of General Multiple Test Procedures for Redundant Systems of Hypotheses , 1988 .

[8]  G. Hommel A stagewise rejective multiple test procedure based on a modified Bonferroni test , 1988 .

[9]  D. Rom A sequentially rejective test procedure based on a modified Bonferroni inequality , 1990 .

[10]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[11]  S. P. Wright,et al.  Adjusted P-values for simultaneous inference , 1992 .

[12]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[13]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[14]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[15]  G. E. Thomas Resampling‐Based Multiple Testing: Examples and Methods for p‐Value Adjustment , 1994 .

[16]  G Hommel,et al.  A rapid algorithm and a computer program for multiple test procedures using logical structures of hypotheses. , 1994, Computer methods and programs in biomedicine.

[17]  J. Shaffer Multiple Hypothesis Testing , 1995 .

[18]  Yosef Hochberg,et al.  Extensions of multiple testing procedures based on Simes' test , 1995 .

[19]  G. Hommel,et al.  Bonferroni procedures for logically related hypotheses , 1999 .

[20]  Maliha S. Nash,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 2001, Technometrics.

[21]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[22]  Joseph F. Murray,et al.  Machine Learning Methods for Predicting Failures in Hard Drives: A Multiple-Instance Application , 2005, J. Mach. Learn. Res..

[23]  Nicolás García-Pedrajas,et al.  Immune Network based Ensembles , 2007, ESANN.

[24]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[25]  Art B. Owen,et al.  Infinitely Imbalanced Logistic Regression , 2007, J. Mach. Learn. Res..

[26]  Shaul Markovitch,et al.  Anytime Learning of Decision Trees , 2007, J. Mach. Learn. Res..

[27]  Rafael Morales Bueno,et al.  Learning in Environments with Unknown Dynamics: Towards more Robust Concept Learners , 2007, J. Mach. Learn. Res..

[28]  Lawrence O. Hall,et al.  A Comparison of Decision Tree Ensemble Creation Techniques , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Geoffrey I. Webb,et al.  Classifying under computational resource constraints: anytime classification using probabilistic estimators , 2007, Machine Learning.

[30]  Geoffrey I. Webb,et al.  To Select or To Weigh: A Comparative Study of Linear Combination Schemes for SuperParent-One-Dependence Estimators , 2007, IEEE Transactions on Knowledge and Data Engineering.

[31]  Robert P. W. Duin,et al.  Maximizing the area under the ROC curve by pairwise feature combination , 2008, Pattern Recognit..

[32]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..