Learning Interpretable Classification Rules with Boolean Compressed Sensing

An important problem in the context of supervised machine learning is designing systems which are interpretable by humans. In domains such as law, medicine, and finance that deal with human lives, delegating the decision to a black-box machine-learning model carries significant operational risk, and often legal implications, thus requiring interpretable classifiers. Building on ideas from Boolean compressed sensing, we propose a rule-based classifier which explicitly balances accuracy versus interpretability in a principled optimization formulation. We represent the problem of learning conjunctive clauses or disjunctive clauses as an adaptation of a classical problem from statistics, Boolean group testing, and apply a novel linear programming (LP) relaxation to find solutions. We derive theoretical results for recovering sparse rules which parallel the conditions for exact recovery of sparse signals in the compressed sensing literature. This is an exciting development in interpretable learning where most prior work has focused on heuristic solutions. We also consider a more general class of rule-based classifiers, checklists and scorecards, learned using ideas from threshold group testing. We show competitive classification accuracy using the proposed approach on real-world data sets.

[1]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Kush R. Varshney,et al.  Exact Rule Learning via Boolean Compressed Sensing , 2013, ICML.

[3]  Arya Mazumdar,et al.  On Almost Disjunct Matrices for Group Testing , 2011, ISAAC.

[4]  Arkadii G. D'yachkov,et al.  A survey of superimposed code theory , 1983 .

[5]  John Langford,et al.  Beating the hold-out: bounds for K-fold and progressive cross-validation , 1999, COLT '99.

[6]  Laurent El Ghaoui,et al.  Safe Feature Elimination in Sparse Supervised Learning , 2010, ArXiv.

[7]  Martha J. Radford,et al.  Validation of Clinical Classification Schemes for Predicting Stroke: Results From the National Registry of Atrial Fibrillation , 2001 .

[8]  Dmitry M. Malioutov,et al.  Sequential Compressed Sensing , 2010, IEEE Journal of Selected Topics in Signal Processing.

[9]  Kiri Wagstaff,et al.  Machine Learning that Matters , 2012, ICML.

[10]  Bogdan E. Popescu,et al.  PREDICTIVE LEARNING VIA RULE ENSEMBLES , 2008, 0811.1679.

[11]  D. Du,et al.  Pooling Designs And Nonadaptive Group Testing: Important Tools For Dna Sequencing , 2006 .

[12]  George Atia,et al.  Boolean Compressed Sensing and Noisy Group Testing , 2009, IEEE Transactions on Information Theory.

[13]  Pat Langley,et al.  Static Versus Dynamic Sampling for Data Mining , 1996, KDD.

[14]  Yun Wang,et al.  Tradeoffs in improved screening of lasso problems , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[16]  Mikhail B. Malyutov,et al.  Search for Sparse Active Inputs: A Review , 2013, Information Theory, Combinatorics, and Search Theory.

[17]  Dino Sejdinovic,et al.  Note on noisy group testing: Asymptotic bounds and belief propagation reconstruction , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[18]  Hao Xu,et al.  Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries , 2011, NIPS.

[19]  Douglas R. Stinson,et al.  Generalized cover-free families , 2004, Discret. Math..

[20]  Jieping Ye,et al.  Safe Screening With Variational Inequalities and Its Applicaiton to LASSO , 2013, ICML.

[21]  M. Malyutov The separating property of random matrices , 1978 .

[22]  Cynthia Rudin,et al.  An Integer Optimization Approach to Associative Classification , 2012, NIPS.

[23]  Arkadii G. D'yachkov,et al.  Families of Finite Sets in which No Intersection of Sets Is Covered by the Union of s Others , 2002, J. Comb. Theory A.

[24]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[25]  Kristiaan Pelckmans,et al.  An ellipsoid based, two-stage screening test for BPDN , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[26]  John Shawe-Taylor,et al.  The Set Covering Machine , 2003, J. Mach. Learn. Res..

[27]  Peter J. Ramadge,et al.  Fast lasso screening tests based on correlations , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Tim Oates,et al.  Efficient progressive sampling , 1999, KDD '99.

[29]  H. Sox,et al.  Clinical prediction rules. Applications and methodological standards. , 1985, The New England journal of medicine.

[30]  Kush R. Varshney,et al.  Screening for learning classification rules via Boolean compressed sensing , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[31]  R. Rivest Learning Decision Lists , 1987, Machine Learning.

[32]  Hao Wu,et al.  The 2-codeword screening test for lasso problems , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Andrew W. Moore,et al.  Hoeffding Races: Accelerating Model Selection Search for Classification and Function Approximation , 1993, NIPS.

[34]  Hung-Lin Fu,et al.  Nonadaptive algorithms for threshold group testing , 2009, Discret. Appl. Math..

[35]  Ganesh Ramakrishnan,et al.  Efficient Rule Ensemble Learning using Hierarchical Kernels , 2011, ICML.

[36]  S. Adams,et al.  Clinical prediction rules , 2012, BMJ : British Medical Journal.

[37]  Wojciech Kotlowski,et al.  ENDER: a statistical framework for boosting decision rules , 2010, Data Mining and Knowledge Discovery.

[38]  Stefan Kramer,et al.  Margin-Based First-Order Rule Learning , 2006, ILP.

[39]  Jie Wang,et al.  Lasso screening rules via dual polytope projection , 2012, J. Mach. Learn. Res..

[40]  Jian Liu,et al.  Finding Cancer Biomarkers from Mass Spectrometry Data by Decision Lists , 2005, J. Comput. Biol..

[41]  Yun Wang,et al.  Lasso screening with a small regularization parameter , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[42]  A.C. Gilbert,et al.  Group testing and sparse signal recovery , 2008, 2008 42nd Asilomar Conference on Signals, Systems and Computers.

[43]  Dmitry M. Malioutov,et al.  Boolean compressed sensing: LP relaxation for group testing , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[44]  Toshihide Ibaraki,et al.  An Implementation of Logical Analysis of Data , 2000, IEEE Trans. Knowl. Data Eng..

[45]  Richard C. Singleton,et al.  Nonrandom binary superimposed codes , 1964, IEEE Trans. Inf. Theory.

[46]  A. Gawande,et al.  The Checklist Manifesto: How to Get Things Right , 2011 .

[47]  Kush R. Varshney,et al.  Learning interpretable classification rules using sequential rowsampling , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[48]  E.J. Candes,et al.  An Introduction To Compressive Sampling , 2008, IEEE Signal Processing Magazine.

[49]  Olgica Milenkovic,et al.  Semiquantitative Group Testing , 2014, IEEE Transactions on Information Theory.

[50]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[51]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[52]  Noam Goldberg,et al.  An Improved Branch-and-Bound Method for Maximum Monomial Agreement , 2012, INFORMS J. Comput..

[53]  Amin Karbasi,et al.  Compressed sensing with probabilistic measurements: A group testing solution , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).