C4.5 competence map: a phase transition-inspired approach

How to determine a priori whether a learning algorithm is suited to a learning problem instance is a major scientific and technological challenge. A first step toward this goal, inspired by the Phase Transition (PT) paradigm developed in the Constraint Satisfaction domain, is presented in this paper.Based on the PT paradigm, extensive and principled experiments allow for constructing the Competence Map associated to a learning algorithm, describing the regions where this algorithm on average fails or succeeds. The approach is illustrated on the long and widely used C4.5 algorithm. A non trivial failure region in the landscape of k-term DNF languages is observed and some interpretations are offered for the experimental results.

[1]  Peter C. Cheeseman,et al.  Where the Really Hard Problems Are , 1991, IJCAI.

[2]  Hilan Bensusan,et al.  Meta-Learning by Landmarking Various Learning Algorithms , 2000, ICML.

[3]  João Gama,et al.  Characterizing the Applicability of Classification Algorithms Using Meta-Level Learning , 1994, ECML.

[4]  R. Schapire The Strength of Weak Learnability , 1990, Machine Learning.

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[7]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[8]  Johannes Fürnkranz,et al.  An Analysis of Rule Evaluation Metrics , 2003, ICML.

[9]  Luc De Raedt,et al.  Phase Transitions and Stochastic Local Search in k-Term DNF Learning , 2002, ECML.

[10]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[11]  Jason Weston,et al.  Vicinal Risk Minimization , 2000, NIPS.

[12]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[13]  Hilan Bensusan,et al.  Estimating the Predictive Accuracy of a Classifier , 2001, ECML.

[14]  Lorenza Saitta,et al.  Phase Transitions in Relational Learning , 2000, Machine Learning.

[15]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[16]  Alexandros Kalousis,et al.  Algorithm selection via meta-learning , 2002 .

[17]  Wei-Yin Loh,et al.  A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[18]  Michèle Sebag,et al.  Relational Learning as Search in a Critical Region , 2003, J. Mach. Learn. Res..

[19]  Robert C. Holte,et al.  Concept Learning and the Problem of Small Disjuncts , 1989, IJCAI.

[20]  D. Wolpert,et al.  No Free Lunch Theorems for Search , 1995 .

[21]  László Györfi,et al.  A Probabilistic Theory of Pattern Recognition , 1996, Stochastic Modelling and Applied Probability.

[22]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[23]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[24]  Gérard Dreyfus,et al.  Ranking a Random Feature for Variable and Feature Selection , 2003, J. Mach. Learn. Res..