On the quest for optimal rule learning heuristics

The primary goal of the research reported in this paper is to identify what criteria are responsible for the good performance of a heuristic rule evaluation function in a greedy top-down covering algorithm. We first argue that search heuristics for inductive rule learning algorithms typically trade off consistency and coverage, and we investigate this trade-off by determining optimal parameter settings for five different parametrized heuristics. In order to avoid biasing our study by known functional families, we also investigate the potential of using metalearning for obtaining alternative rule learning heuristics. The key results of this experimental study are not only practical default values for commonly used heuristics and a broad comparative evaluation of known and novel rule learning heuristics, but we also gain theoretical insights into factors that are responsible for a good performance. For example, we observe that consistency should be weighted more heavily than coverage, presumably because a lack of coverage can later be corrected by learning additional rules.

[1]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[2]  Jiawei Han,et al.  Association Mining in Large Databases: A Re-examination of Its Measures , 2007, PKDD.

[3]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[4]  Johannes Fürnkranz Modeling Rule Precision , 2004, LWA.

[5]  Peter A. Flach,et al.  Predictive Performance of Weghted Relative Accuracy , 2000, PKDD.

[6]  N. Lavra,et al.  Predictive Performance of Weighted Relative Accuracy , 2000 .

[7]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[8]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[9]  Johannes Fürnkranz,et al.  Pruning Algorithms for Rule Learning , 1997, Machine Learning.

[10]  J. Ross Quinlan Learning First-Order Definitions of Functions , 1996, J. Artif. Intell. Res..

[11]  H. Akaike A new look at the statistical model identification , 1974 .

[12]  Yann LeCun,et al.  Measuring the VC-Dimension of a Learning Machine , 1994, Neural Computation.

[13]  John Mingers,et al.  An empirical comparison of selection measures for decision-tree induction , 2004, Machine Learning.

[14]  Johannes Fürnkranz,et al.  On Meta-Learning Rule Learning Heuristics , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[15]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[16]  Luc De Raedt,et al.  Proceedings of the 12th European Conference on Machine Learning , 2001 .

[17]  J. R. Quinlan Learning Logical Definitions from Relations , 1990 .

[18]  Peter A. Flach,et al.  Rule Evaluation Measures: A Unifying View , 1999, ILP.

[19]  Robert C. Holte,et al.  Concept Learning and the Problem of Small Disjuncts , 1989, IJCAI.

[20]  Johannes Fürnkranz,et al.  A Re-evaluation of the Over-Searching Phenomenon in Inductive Rule Learning , 2008, LWA.

[21]  Johannes Fürnkranz,et al.  ROC ‘n’ Rule Learning—Towards a Better Understanding of Covering Algorithms , 2005, Machine Learning.

[22]  Jaideep Srivastava,et al.  Selecting the right interestingness measure for association patterns , 2002, KDD.

[23]  Willi Klösgen,et al.  Problems for knowledge discovery in databases and their treatment in the statistics interpreter explora , 1992, Int. J. Intell. Syst..

[24]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[25]  JOHANNES FÜRNKRANZ,et al.  Separate-and-Conquer Rule Learning , 1999, Artificial Intelligence Review.

[26]  Johannes Fürnkranz,et al.  An Empirical Investigation of the Trade-Off between Consistency and Coverage in Rule Learning Heuristics , 2008, Discovery Science.

[27]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[28]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[29]  Hui Xiong,et al.  Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pairs , 2004, KDD.

[30]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[31]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[32]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[33]  Koichi Furukawa,et al.  Special issue on inductive logic programming , 2009, New Generation Computing.

[34]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[35]  Roberto J. Bayardo,et al.  Mining the most interesting rules , 1999, KDD '99.

[36]  Peter A. Flach,et al.  4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD2000) , 2000 .

[37]  Tobias Scheffer Finding association rules that trade support optimally against confidence , 2005 .

[38]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[39]  Johannes Fürnkranz Proceedings of the ECML/PKDD-04 Workshop on Advances in Inductive Rule Learning , 2004 .

[40]  Ivan Bratko,et al.  Why Is Rule Learning Optimistic and How to Correct It , 2006, ECML.

[41]  Johannes Fürnkranz,et al.  Incremental Reduced Error Pruning , 1994, ICML.

[42]  Sven Burges Meta-Lernen einer Evaluierungs-Funktion für einen Regel-Lerner , 2006 .

[43]  Arno Sprecher,et al.  An Artificial Intelligence Approach , 1994 .

[44]  Stephen Muggleton,et al.  Inverse entailment and progol , 1995, New Generation Computing.

[45]  Johannes Fürnkranz,et al.  An Analysis of Stopping and Filtering Criteria for Rule Learning , 2004, ECML.

[46]  Wray L. Buntine,et al.  A Further Comparison of Splitting Rules for Decision-Tree Induction , 1992, Machine Learning.

[47]  Johannes Fürnkranz,et al.  FOSSIL: A Robust Relational Learner , 1994, ECML.

[48]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.

[49]  Matthias Thiel Separate and Conquer Framework und disjunktive Regeln , 2005 .

[50]  Stefan Wrobel,et al.  An Algorithm for Multi-relational Discovery of Subgroups , 1997, PKDD.