Boolean Decision Rules via Column Generation

This paper considers the learning of Boolean rules in either disjunctive normal form (DNF, OR-of-ANDs, equivalent to decision rule sets) or conjunctive normal form (CNF, AND-of-ORs) as an interpretable model for classification. An integer program is formulated to optimally trade classification accuracy for rule simplicity. Column generation (CG) is used to efficiently search over an exponential number of candidate clauses (conjunctions or disjunctions) without the need for heuristic rule mining. This approach also bounds the gap between the selected rule set and the best possible rule set on the training data. To handle large datasets, we propose an approximate CG algorithm using randomization. Compared to three recently proposed alternatives, the CG algorithm dominates the accuracy-simplicity trade-off in 8 out of 16 datasets. When maximized for accuracy, CG is competitive with rule learners designed for this purpose, sometimes finding significantly simpler solutions that are no less accurate.

[1]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[2]  Bogdan E. Popescu,et al.  PREDICTIVE LEARNING VIA RULE ENSEMBLES , 2008, 0811.1679.

[3]  Guosheng Lin,et al.  Learning Hash Functions Using Column Generation , 2013, ICML.

[4]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[5]  Christian Borgelt,et al.  An implementation of the FP-growth algorithm , 2005 .

[6]  Jianyong Wang,et al.  HARMONY: Efficiently Mining the Best Rules for Classification , 2005, SDM.

[7]  Tong Wang,et al.  Learning Optimized Or's of And's , 2015, ArXiv.

[8]  Rocco A. Servedio,et al.  Learning DNF in time 2Õ(n1/3) , 2004, J. Comput. Syst. Sci..

[9]  Xing Zhang,et al.  A new approach to classification based on association rule mining , 2006, Decis. Support Syst..

[10]  Alex Alves Freitas,et al.  Comprehensible classification models: a position paper , 2014, SKDD.

[11]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[12]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[13]  Jure Leskovec,et al.  Interpretable Decision Sets: A Joint Framework for Description and Prediction , 2016, KDD.

[14]  Cynthia Rudin,et al.  Falling Rule Lists , 2014, AISTATS.

[15]  Kush R. Varshney,et al.  Screening for learning classification rules via Boolean compressed sensing , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Kush R. Varshney,et al.  Learning sparse two-level boolean rules , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[17]  Peter L. Hammer,et al.  Logical analysis of data—An overview: From combinatorial optimization to medical applications , 2006, Ann. Oper. Res..

[18]  Kush R. Varshney,et al.  Exact Rule Learning via Boolean Compressed Sensing , 2013, ICML.

[19]  Vitaly Feldman Learning DNF Expressions from Fourier Spectrum , 2012, COLT.

[20]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[21]  Martin W. P. Savelsbergh,et al.  Branch-and-Price: Column Generation for Solving Huge Integer Programs , 1998, Oper. Res..

[22]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[23]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[24]  Laurence A. Wolsey,et al.  An exact algorithm for IP column generation , 1994, Oper. Res. Lett..

[25]  G. Nemhauser,et al.  Integer Programming , 2020 .

[26]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[27]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[28]  Adam R. Klivans,et al.  Learning DNF in time 2 Õ(n 1/3 ) . , 2001, STOC 2001.

[29]  Cynthia Rudin,et al.  Learning Cost-Effective and Interpretable Treatment Regimes , 2017, AISTATS.

[30]  Johannes Fürnkranz,et al.  Foundations of Rule Learning , 2012, Cognitive Technologies.

[31]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[32]  John Shawe-Taylor,et al.  The Set Covering Machine , 2003, J. Mach. Learn. Res..

[33]  Jinbo Bi,et al.  Column-generation boosting methods for mixture of kernels , 2004, KDD.

[34]  Gérard Cornuéjols,et al.  Integer programming , 2014, Math. Program..

[35]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[36]  Dimitris Bertsimas,et al.  Optimal classification trees , 2017, Machine Learning.

[37]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[38]  S. Salzberg A nearest hyperrectangle learning method , 2004, Machine Learning.

[39]  Nicholas I. Fisher,et al.  Bump hunting in high-dimensional data , 1999, Stat. Comput..

[40]  Margo I. Seltzer,et al.  Scalable Bayesian Rule Lists , 2016, ICML.

[41]  Cynthia Rudin,et al.  A Bayesian Framework for Learning Rule Sets for Interpretable Classification , 2017, J. Mach. Learn. Res..

[42]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[43]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.

[44]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[45]  Yoram Singer,et al.  A simple, fast, and effective rule learner , 1999, AAAI 1999.

[46]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[47]  Wojciech Kotlowski,et al.  ENDER: a statistical framework for boosting decision rules , 2010, Data Mining and Knowledge Discovery.

[48]  Pedro M. Domingos Unifying Instance-Based and Rule-Based Induction , 1996, Machine Learning.

[49]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[50]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[51]  Hanif D. Sherali,et al.  Linear Programming and Network Flows , 1977 .

[52]  Marco Muselli,et al.  Binary Rule Generation via Hamming Clustering , 2002, IEEE Trans. Knowl. Data Eng..

[53]  MuchnikIlya,et al.  An Implementation of Logical Analysis of Data , 2000 .

[54]  Margo I. Seltzer,et al.  Learning Certifiably Optimal Rule Lists , 2017, KDD.