Automated Empirical Selection of Rule Induction Methods Based on Recursive Iteration of Resampling Methods and Multiple Testing

This paper proposes a method for multiple testing based on recursive iteration of resampling methods for rule induction. The method generates training samples and test samples in a two-level hierarchical way, and compared the results between these two levels, which corresponding to second-order approximation of estimators in Edge worth expansion. We applied this MULT-RECITE-R method to three newly collected medical databases and seven UCI databases. The results show that this method gives the best selection of estimation methods in almost the all cases.

[1]  B. Efron How Biased is the Apparent Error Rate of a Prediction Rule , 1986 .

[2]  Sebastian Thrun,et al.  The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .

[3]  Geoffrey J. McLachlan,et al.  Discriminant Analysis and Statistical Pattern Recognition: McLachlan/Discriminant Analysis & Pattern Recog , 2005 .

[4]  Cullen Schaffer,et al.  Technical Note: Selecting a Classification Method by Cross-Validation , 1993, Machine Learning.

[5]  Thomas G. Dietterich,et al.  Readings in Machine Learning , 1991 .

[6]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[9]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[10]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[11]  Nada Lavrac,et al.  The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains , 1986, AAAI.

[12]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[13]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[14]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[15]  E. Mammen The Bootstrap and Edgeworth Expansion , 1997 .

[16]  Cullen Schaffer,et al.  Selecting a classification method by cross-validation , 1993, Machine Learning.

[17]  Cullen Schaffer Overfitting avoidance as bias , 2004, Machine Learning.

[18]  B. Efron Better Bootstrap Confidence Intervals , 1987 .

[19]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[20]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .