Finding a short and accurate decision rule in disjunctive normal form by exhaustive search

Greedy approaches suffer from a restricted search space which could lead to suboptimal classifiers in terms of performance and classifier size. This study discusses exhaustive search as an alternative to greedy search for learning short and accurate decision rules. The Exhaustive Procedure for LOgic-Rule Extraction (EXPLORE) algorithm is presented, to induce decision rules in disjunctive normal form (DNF) in a systematic and efficient manner. We propose a method based on subsumption to reduce the number of values considered for instantiation in the literals, by taking into account the relational operator without loss of performance. Furthermore, we describe a branch-and-bound approach that makes optimal use of user-defined performance constraints. To improve the generalizability we use a validation set to determine the optimal length of the DNF rule. The performance and size of the DNF rules induced by EXPLORE are compared to those of eight well-known rule learners. Our results show that an exhaustive approach to rule learning in DNF results in significantly smaller classifiers than those of the other rule learners, while securing comparable or even better performance. Clearly, exhaustive search is computer-intensive and may not always be feasible. Nevertheless, based on this study, we believe that exhaustive search should be considered an alternative for greedy search in many problems.

[1]  Antoine Zoghbiu,et al.  Fast Algorithms for Generating Integer Partitions , 1994 .

[2]  Richard B. Segal Machine learning as massive search , 1998 .

[3]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[4]  Pedro M. Domingos The RISE system: conquering without separating , 1994, Proceedings Sixth International Conference on Tools with Artificial Intelligence. TAI 94.

[5]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[6]  Guido Dedene,et al.  Cost-sensitive learning and decision making revisited , 2005, Eur. J. Oper. Res..

[7]  Luc De Raedt,et al.  An experimental evaluation of simplicity in rule learning , 2008, Artif. Intell..

[8]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[9]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[10]  Roberto J. Bayardo Brute-Force Mining of High-Confidence Classification Rules , 1997, KDD.

[11]  Ivan Stojmenovic,et al.  Fast algorithms for genegrating integer partitions , 1998, Int. J. Comput. Math..

[12]  S. Finch Integer partitions , 2021 .

[13]  R. Mike Cameron-Jones,et al.  Oversearching and Layered Search in Empirical Learning , 1995, IJCAI.

[14]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[15]  Jan A. Kors,et al.  Induction of decision rules that fulfil user-specified performance requirements , 1997, Pattern Recognit. Lett..

[16]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[17]  Oren Etzioni,et al.  Learning Decision Lists Using Homogeneous Rules , 1994, AAAI.

[18]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[19]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[20]  Nada Lavrac,et al.  A Study of Relevance for Learning in Deductive Databases , 1999, J. Log. Program..

[21]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[22]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.

[23]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[24]  Thomas G. Dietterich Overfitting and undercomputing in machine learning , 1995, CSUR.

[25]  Nada Lavrac,et al.  The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains , 1986, AAAI.

[26]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[27]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[28]  R. Galen,et al.  Beyond Normality: The Predictive Value and E ciency of Medical Diagnoses , 1975 .

[29]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[30]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[31]  Pedro M. Domingos The Role of Occam's Razor in Knowledge Discovery , 1999, Data Mining and Knowledge Discovery.

[32]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[33]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[34]  Sholom M. Weiss,et al.  Maximizing the Predictive Value of Production Rules , 1990, Artif. Intell..

[35]  Geoffrey I. Webb OPUS: An Efficient Admissible Algorithm for Unordered Search , 1995, J. Artif. Intell. Res..