Automated Selection of Rule Induction Methods Based on Recursive Iteration of Resampling Methods and Multiple Statistical Testing

One of the most important problems in rule induction methods is how to estimate which method is the best to use in an applied domain. While some methods are useful in some domains, they are not useful in other domains. Therefore it is very difficult to choose one of these methods. For this purpose, we introduce multiple testing based on recursive iteration of resampling methods for rule-induction (MULT-RECITE-R). This method consists of four procedures, which includes the inner loop and the outer loop procedures. First, original training samples(S0) are randomly split into new training samples(S1) and test samples(T1) using a resampling scheme. Second, S1 are again split into training sample(S2) and training samples(T2) using the same resampling scheme. Rule induction methods are applied and predefined metrics are calculated. This second procedure, as the inner loop, is repeated for 10000 times. Then, third, rule induction methods are applied to S1, and the metrics calculated by T1 are compared with those by T2. If the metrics derived by T2 predicts those by T1, then we count it as a success. The second and third procedures, as the outer loop, are iterated for 10000 times. Finally, fourth, the overall results are interpreted, and the best method is selected if the resampling scheme performs well. In order to evaluate this system, we apply this MULT-RECITE-R method to three UCI databases. The results show that this method gives the best selection of estimation methods statistically.

[1]  James L. McClelland Explorations In Parallel Distributed Processing , 1988 .

[2]  S. Holm A Simple Sequentially Rejective Multiple Test Procedure , 1979 .

[3]  Yogendra P. Chaubey Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[4]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[5]  B. Efron The jackknife, the bootstrap, and other resampling plans , 1987 .

[6]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[7]  James L. McClelland,et al.  Explorations in parallel distributed processing: a handbook of models, programs, and exercises , 1988 .

[8]  E. Mammen The Bootstrap and Edgeworth Expansion , 1997 .

[9]  Hiroshi Tanaka,et al.  Selection of Probabilistic Measure Estimation Method Based on Recursive Iteration of Resampling Methods , 1994, KDD Workshop.

[10]  D. Wolpert On Overfitting Avoidance as Bias , 1993 .

[11]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[12]  Nada Lavrac,et al.  The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains , 1986, AAAI.

[13]  S. S. Young,et al.  Resampling-Based Multiple Testing: Examples and Methods for p-Value Adjustment , 1993 .

[14]  Sebastian Thrun,et al.  The MONK''s Problems-A Performance Comparison of Different Learning Algorithms, CMU-CS-91-197, Sch , 1991 .

[15]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .