A Re-evaluation of the Over-Searching Phenomenon in Inductive Rule Learning

Most commonly used inductive rule learning algorithms employ a hill-climbing search, whereas local pattern discovery algorithms employ exhaustive search. In this paper, we evaluate the spectrum of different search strategies to see whether separate-and-conquer rule learning algorithms are able to gain performance in terms of predictive accuracy or theory size by using more powerful search strategies like beam search or exhaustive search. Unlike previous results that demonstrated that rule learning algorithms suffer from oversearching, our work pays particular attention to the connection between the search heuristic and the search strategy, and we show that for some rule evaluation functions, complex search algorithms will consistently improve results without suffering from the over-searching phenomenon. In particular, we will see that this is typically the case for heuristics which perform bad in a hill-climbing search. We interpret this as evidence that commonly used rule learning heuristics mix two different aspects: a rule evaluation metric that measures the predictive quality of a rule, and a search heuristic that captures the potential of a candidate rule to be refined into highly predictive rule. For effective exhaustive search, these two aspects need to be clearly separated.

[1]  Tim Niblett,et al.  Constructing Decision Trees in Noisy Domains , 1987, EWSL.

[2]  Steven Salzberg,et al.  Lookahead and Pathology in Decision Tree Induction , 1995, IJCAI.

[3]  R. Mike Cameron-Jones,et al.  Oversearching and Layered Search in Empirical Learning , 1995, IJCAI.

[4]  Johannes Fürnkranz,et al.  ROC ‘n’ Rule Learning—Towards a Better Understanding of Covering Algorithms , 2005, Machine Learning.

[5]  Robert C. Holte,et al.  Concept Learning and the Problem of Small Disjuncts , 1989, IJCAI.

[6]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[7]  Geoffrey I. Webb OPUS: An Efficient Admissible Algorithm for Unordered Search , 1995, J. Artif. Intell. Res..

[8]  Peter A. Flach,et al.  Rule Evaluation Measures: A Unifying View , 1999, ILP.

[9]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[10]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[11]  JOHANNES FÜRNKRANZ,et al.  Separate-and-Conquer Rule Learning , 1999, Artificial Intelligence Review.

[12]  Johannes Fürnkranz,et al.  From Local Patterns to Global Models: The LeGo Approach to Data Mining , 2008 .

[13]  Hui Xiong,et al.  Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pairs , 2004, KDD.

[14]  Johannes Fürnkranz,et al.  An Empirical Quest for Optimal Rule Learning Heuristics , 2008 .

[15]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[16]  Johannes Fürnkranz,et al.  An Empirical Investigation of the Trade-Off between Consistency and Coverage in Rule Learning Heuristics , 2008, Discovery Science.

[17]  Peter A. Flach,et al.  Rule induction for subgroup discovery with CN2-SD , 2002 .

[18]  N. Lavra,et al.  Predictive Performance of Weighted Relative Accuracy , 2000 .

[19]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[20]  Johannes Fürnkranz,et al.  On Meta-Learning Rule Learning Heuristics , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[21]  Johannes Fürnkranz,et al.  FOSSIL: A Robust Relational Learner , 1994, ECML.

[22]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[23]  Peter A. Flach,et al.  Predictive Performance of Weghted Relative Accuracy , 2000, PKDD.

[24]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[25]  Pedro M. Domingos The Role of Occam's Razor in Knowledge Discovery , 1999, Data Mining and Knowledge Discovery.