Interpretable and Fair Boolean Rule Sets via Column Generation

This paper considers the learning of Boolean rules in either disjunctive normal form (DNF, ORof-ANDs, equivalent to decision rule sets) or conjunctive normal form (CNF, AND-of-ORs) as an interpretable model for classification. An integer program is formulated to optimally trade classification accuracy for rule simplicity. We also consider the fairness setting and extend the formulation to include explicit constraints on two different measures of classification parity: equality of opportunity and equalized odds. Column generation (CG) is used to efficiently search over an exponential number of candidate clauses (conjunctions or disjunctions) without the need for heuristic rule mining. This approach also bounds the gap between the selected rule set and the best possible rule set on the training data. To handle large datasets, we propose an approximate CG algorithm using randomization. Compared to three recently proposed alternatives, the CG algorithm dominates the accuracy-simplicity trade-off in 8 out of 16 datasets. When maximized for accuracy, CG is competitive with rule learners designed for this purpose, sometimes finding significantly simpler solutions that are no less accurate. Compared to other fair and interpretable classifiers, our method is able to find rule sets that meet stricter notions of fairness with a modest trade-off in accuracy.

[1]  Vitaly Feldman Learning DNF Expressions from Fourier Spectrum , 2012, COLT.

[2]  R. Agarwal Fast Algorithms for Mining Association Rules , 1994, VLDB 1994.

[3]  Margo I. Seltzer,et al.  Learning Certifiably Optimal Rule Lists , 2017, KDD.

[4]  Alexandra Chouldechova,et al.  Fair prediction with disparate impact: A study of bias in recidivism prediction instruments , 2016, Big Data.

[5]  Marco Muselli,et al.  Binary Rule Generation via Hamming Clustering , 2002, IEEE Trans. Knowl. Data Eng..

[6]  R. Gomory,et al.  A Linear Programming Approach to the Cutting-Stock Problem , 1961 .

[7]  Margo I. Seltzer,et al.  Scalable Bayesian Rule Lists , 2016, ICML.

[8]  Toon Calders,et al.  Three naive Bayes approaches for discrimination-free classification , 2010, Data Mining and Knowledge Discovery.

[9]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[11]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[12]  Jinbo Bi,et al.  Column-generation boosting methods for mixture of kernels , 2004, KDD.

[13]  S. Ilker Birbil,et al.  Rule Covering for Interpretation and Boosting , 2020, ArXiv.

[14]  Johannes Fürnkranz,et al.  Foundations of Rule Learning , 2012, Cognitive Technologies.

[15]  Amos J. Storkey,et al.  Censoring Representations with an Adversary , 2015, ICLR.

[16]  Dimitris Bertsimas,et al.  Interpretable clustering: an optimization approach , 2020, Machine Learning.

[17]  Nicholas I. Fisher,et al.  Bump hunting in high-dimensional data , 1999, Stat. Comput..

[18]  Bogdan E. Popescu,et al.  PREDICTIVE LEARNING VIA RULE ENSEMBLES , 2008, 0811.1679.

[19]  Kush R. Varshney,et al.  Screening for learning classification rules via Boolean compressed sensing , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Toniann Pitassi,et al.  Fairness through awareness , 2011, ITCS '12.

[21]  Krishna P. Gummadi,et al.  Fairness Beyond Disparate Treatment & Disparate Impact: Learning Classification without Disparate Mistreatment , 2016, WWW.

[22]  Christopher T. Lowenkamp,et al.  False Positives, False Negatives, and False Analyses: A Rejoinder to "Machine Bias: There's Software Used across the Country to Predict Future Criminals. and It's Biased against Blacks" , 2016 .

[23]  Jianyong Wang,et al.  HARMONY: Efficiently Mining the Best Rules for Classification , 2005, SDM.

[24]  Ailsa H. Land,et al.  An Automatic Method of Solving Discrete Programming Problems , 1960 .

[25]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[26]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[27]  Cynthia Rudin,et al.  Supersparse Linear Integer Models for Interpretable Classification , 2013, 1306.6677.

[28]  Cynthia Rudin,et al.  Learning Optimized Risk Scores , 2016, J. Mach. Learn. Res..

[29]  Pedro M. Domingos Unifying Instance-Based and Rule-Based Induction , 1996, Machine Learning.

[30]  Xing Zhang,et al.  A new approach to classification based on association rule mining , 2006, Decis. Support Syst..

[31]  Christian Borgelt,et al.  An implementation of the FP-growth algorithm , 2005 .

[32]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[33]  Faisal Kamiran,et al.  Quantifying explainable discrimination and removing illegal discrimination in automated decision making , 2012, Knowledge and Information Systems.

[34]  Shai Ben-David,et al.  Empirical Risk Minimization under Fairness Constraints , 2018, NeurIPS.

[36]  Laurence A. Wolsey,et al.  An exact algorithm for IP column generation , 1994, Oper. Res. Lett..

[37]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[38]  Oktay Günlük,et al.  Optimal Generalized Decision Trees via Integer Programming , 2016, ArXiv.

[39]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[40]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[41]  Phebe Vayanos,et al.  Learning Optimal and Fair Decision Trees for Non-Discriminative Decision-Making , 2019, AAAI.

[42]  Cynthia Rudin,et al.  Learning Cost-Effective and Interpretable Treatment Regimes , 2017, AISTATS.

[43]  Yingqian Zhang,et al.  Learning Optimal Classification Trees Using a Binary Linear Program Formulation , 2019, BNAIC/BENELEARN.

[44]  Steven Salzberg,et al.  A Nearest Hyperrectangle Learning Method , 1991, Machine Learning.

[45]  Cynthia Rudin,et al.  A Bayesian Framework for Learning Rule Sets for Interpretable Classification , 2017, J. Mach. Learn. Res..

[46]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[47]  John Shawe-Taylor,et al.  The Set Covering Machine , 2003, J. Mach. Learn. Res..

[48]  Seth Neel,et al.  A Convex Framework for Fair Regression , 2017, ArXiv.

[49]  Sanjeeb Dash,et al.  Boolean Decision Rules via Column Generation , 2018, NeurIPS.

[50]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[51]  Yoram Singer,et al.  A simple, fast, and effective rule learner , 1999, AAAI 1999.

[52]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[53]  TreesKristin P. Bennett,et al.  Optimal Decision Trees , 1996 .

[54]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[55]  Kristina Lerman,et al.  A Survey on Bias and Fairness in Machine Learning , 2019, ACM Comput. Surv..

[56]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[57]  Krishna P. Gummadi,et al.  Fairness Constraints: Mechanisms for Fair Classification , 2015, AISTATS.

[58]  Pierre Schaus,et al.  Learning Optimal Decision Trees Using Caching Branch-and-Bound Search , 2020, AAAI.

[59]  Sanjeeb Dash,et al.  Generalized Linear Rule Models , 2019, ICML.

[60]  Tong Wang,et al.  Learning Optimized Or's of And's , 2015, ArXiv.

[61]  G. Nemhauser,et al.  Integer Programming , 2020 .

[62]  John Langford,et al.  A Reductions Approach to Fair Classification , 2018, ICML.

[63]  Wojciech Kotlowski,et al.  ENDER: a statistical framework for boosting decision rules , 2010, Data Mining and Knowledge Discovery.

[64]  Dimitris Bertsimas,et al.  Optimal classification trees , 2017, Machine Learning.

[65]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[66]  Toshihide Ibaraki,et al.  An Implementation of Logical Analysis of Data , 2000, IEEE Trans. Knowl. Data Eng..

[67]  Kush R. Varshney,et al.  Learning sparse two-level boolean rules , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[68]  Toon Calders,et al.  Discrimination Aware Decision Tree Learning , 2010, 2010 IEEE International Conference on Data Mining.

[69]  Peter L. Hammer,et al.  Logical analysis of data—An overview: From combinatorial optimization to medical applications , 2006, Ann. Oper. Res..

[70]  Ulrike von Luxburg,et al.  Too Relaxed to Be Fair , 2020, ICML.

[71]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[72]  Nisheeth K. Vishnoi,et al.  Classification with Fairness Constraints: A Meta-Algorithm with Provable Guarantees , 2018, FAT.

[73]  Guosheng Lin,et al.  Learning Hash Functions Using Column Generation , 2013, ICML.

[74]  Adam R. Klivans,et al.  Learning DNF in time 2 Õ(n 1/3 ) . , 2001, STOC 2001.

[75]  Sharad Goel,et al.  The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning , 2018, ArXiv.

[76]  Jon M. Kleinberg,et al.  Inherent Trade-Offs in the Fair Determination of Risk Scores , 2016, ITCS.

[77]  Rocco A. Servedio,et al.  Learning DNF in time 2Õ(n1/3) , 2004, J. Comput. Syst. Sci..

[78]  Jure Leskovec,et al.  Interpretable Decision Sets: A Joint Framework for Description and Prediction , 2016, KDD.

[79]  Nathan Srebro,et al.  Equality of Opportunity in Supervised Learning , 2016, NIPS.

[80]  Jun Sakuma,et al.  Fairness-Aware Classifier with Prejudice Remover Regularizer , 2012, ECML/PKDD.

[81]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[82]  Lu Zhang,et al.  Fairness-aware Classification: Criterion, Convexity, and Bounds , 2018, ArXiv.

[83]  Kush R. Varshney,et al.  Exact Rule Learning via Boolean Compressed Sensing , 2013, ICML.

[84]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[85]  Martin W. P. Savelsbergh,et al.  Branch-and-Price: Column Generation for Solving Huge Integer Programs , 1998, Oper. Res..

[86]  Margo I. Seltzer,et al.  Optimal Sparse Decision Trees , 2019, NeurIPS.