Shorter Rules Are Better, Aren't They?

It is conventional wisdom in inductive rule learning that shorter rules should be preferred over longer rules, a principle also known as Occam’s Razor. This is typically justified with the fact that longer rules tend to be more specific and are therefore also more likely to overfit the data. In this position paper, we would like to challenge this assumption by demonstrating that variants of conventional rule learning heuristics, so-called inverted heuristics, learn longer rules that are not more specific than the shorter rules learned by conventional heuristics. Moreover, we will argue with some examples that such longer rules may in many cases be more understandable than shorter rules, again contradicting a widely held view. This is not only relevant for subgroup discovery but also for related concepts like characteristic rules, formal concept analysis, or closed itemsets.

[1]  Gerd Stumme,et al.  Computing iceberg concept lattices with T , 2002, Data Knowl. Eng..

[2]  Ryszard S. Michalski,et al.  On the Quasi-Minimal Solution of the General Covering Problem , 1969 .

[3]  Pedro M. Domingos The Role of Occam's Razor in Knowledge Discovery , 1999, Data Mining and Knowledge Discovery.

[4]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[5]  Bernhard Ganter,et al.  Formal Concept Analysis: Mathematical Foundations , 1998 .

[6]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[7]  Hilan Bensusan,et al.  God Doesn't Always Shave with Occam's Razor - Learning When and How to Prune , 1998, ECML.

[8]  Johannes Fürnkranz,et al.  Separating Rule Refinement and Rule Selection Heuristics in Inductive Rule Learning , 2014, ECML/PKDD.

[9]  Mohammed J. Zaki,et al.  CHARM: An Efficient Algorithm for Closed Itemset Mining , 2002, SDM.

[10]  Geoffrey I. Webb Further Experimental Evidence against the Utility of Occam's Razor , 1996, J. Artif. Intell. Res..

[11]  Nada Lavrac,et al.  Active subgroup mining: a case study in coronary heart disease risk group detection , 2003, Artif. Intell. Medicine.

[12]  Rudolf Wille,et al.  Restructuring Lattice Theory: An Approach Based on Hierarchies of Concepts , 2009, ICFCA.

[13]  Tom M. Mitchell,et al.  The Need for Biases in Learning Generalizations , 2007 .

[14]  Johannes Fürnkranz,et al.  Unsupervised generation of data mining features from linked open data , 2012, WIMS '12.

[15]  Johannes Fürnkranz,et al.  On the quest for optimal rule learning heuristics , 2010, Machine Learning.

[16]  Nada Lavrac,et al.  Contrast Set Mining Through Subgroup Discovery Applied to Brain Ischaemina Data , 2007, PAKDD.

[17]  Ryszard S. Michalski,et al.  A Theory and Methodology of Inductive Learning , 1983, Artificial Intelligence.

[18]  Johannes Fürnkranz,et al.  ROC ‘n’ Rule Learning—Towards a Better Understanding of Covering Algorithms , 2005, Machine Learning.

[19]  Johannes Fürnkranz,et al.  Foundations of Rule Learning , 2012, Cognitive Technologies.

[20]  Michael J. Pazzani,et al.  Exploring the Decision Forest: An Empirical Investigation of Occam's Razor in Decision Tree Induction , 1993, J. Artif. Intell. Res..

[21]  JOHANNES FÜRNKRANZ,et al.  Separate-and-Conquer Rule Learning , 1999, Artificial Intelligence Review.

[22]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[23]  Heiko Paulheim,et al.  Analyzing Statistics with Background Knowledge from Linked Open Data , 2013, SemStats@ISWC.

[24]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..