Interpretable Decision Sets: A Joint Framework for Description and Prediction

One of the most important obstacles to deploying predictive models is the fact that humans do not understand and trust them. Knowing which variables are important in a model's prediction and how they are combined can be very powerful in helping people understand and trust automatic decision making systems. Here we propose interpretable decision sets, a framework for building predictive models that are highly accurate, yet also highly interpretable. Decision sets are sets of independent if-then rules. Because each rule can be applied independently, decision sets are simple, concise, and easily interpretable. We formalize decision set learning through an objective function that simultaneously optimizes accuracy and interpretability of the rules. In particular, our approach learns short, accurate, and non-overlapping rules that cover the whole feature space and pay attention to small but important classes. Moreover, we prove that our objective is a non-monotone submodular function, which we efficiently optimize to find a near-optimal set of rules. Experiments show that interpretable decision sets are as accurate at classification as state-of-the-art machine learning techniques. They are also three times smaller on average than rule-based models learned by other methods. Finally, results of a user study show that people are able to answer multiple-choice questions about the decision boundaries of interpretable decision sets and write descriptions of classes based on them faster and more accurately than with other rule-based models that were designed for interpretability. Overall, our framework provides a new approach to interpretable machine learning that balances accuracy, interpretability, and computational efficiency.

[1]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[2]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[3]  Vahab S. Mirrokni,et al.  Optimal marketing strategies over social networks , 2008, WWW.

[4]  Albrecht Zimmermann,et al.  The Chosen Few: On Identifying Valuable Patterns , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[5]  Kush R. Varshney,et al.  Exact Rule Learning via Boolean Compressed Sensing , 2013, ICML.

[6]  Thomas Richardson,et al.  Interpretable Boosted Naïve Bayes Classification , 1998, KDD.

[7]  Stephen D. Bay,et al.  Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[8]  Tomasz Imielinski,et al.  Mining association rules between sets of items in large databases , 1993, SIGMOD Conference.

[9]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[10]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[11]  Victor R. Basili,et al.  Developing Interpretable Models with Optimized Set Reduction for Identifying High-Risk Software Components , 1993, IEEE Trans. Software Eng..

[12]  David J. Hand,et al.  A Simple Generalisation of the Area Under the ROC Curve for Multiple Class Classification Problems , 2001, Machine Learning.

[13]  Geoffrey I. Webb,et al.  Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining , 2009, J. Mach. Learn. Res..

[14]  Leslie G. Valiant Projection learning , 1998, COLT' 98.

[15]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[16]  Johannes Gehrke,et al.  Intelligible models for classification and regression , 2012, KDD.

[17]  Rayid Ghani,et al.  A Machine Learning Framework to Identify Students at Risk of Adverse Academic Outcomes , 2015, KDD.

[18]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[19]  Cynthia Rudin,et al.  ORC: Ordered Rules for ClassificationA Discrete Optimization Approach to Associative Classification , 2012 .

[20]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[21]  David E. Over,et al.  Bayesian reasoning with ifs and ands and ors , 2015, Front. Psychol..

[22]  Cynthia Rudin,et al.  The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification , 2014, NIPS.

[23]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[24]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[25]  Serge Guillaume,et al.  Designing fuzzy inference systems from data: An interpretability-oriented review , 2001, IEEE Trans. Fuzzy Syst..

[26]  Finale Doshi-Velez,et al.  Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction , 2015, NIPS.

[27]  Peter A. Flach,et al.  Subgroup Discovery with CN2-SD , 2004, J. Mach. Learn. Res..

[28]  Cynthia Rudin,et al.  Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[29]  Rudolf Kruse,et al.  Obtaining interpretable fuzzy classification rules from medical data , 1999, Artif. Intell. Medicine.

[30]  Alan Y. Chiang,et al.  Generalized Additive Models: An Introduction With R , 2007, Technometrics.

[31]  Rocco A. Servedio,et al.  Toward Attribute Efficient Learning of Decision Lists and Parities , 2006, J. Mach. Learn. Res..

[32]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[33]  Yimin Liu,et al.  Or's of And's for Interpretable Classification, with Application to Context-Aware Recommender Systems , 2015, ArXiv.

[34]  Kush R. Varshney,et al.  Interpretable Two-level Boolean Rule Learning for Classification , 2015, ArXiv.

[35]  Suzan Wedyan,et al.  Review and Comparison of Associative Classification Data Mining Approaches , 2014 .

[36]  J. Carroll An analytical solution for approximating simple structure in factor analysis , 1953 .

[37]  H. Schielzeth Simple means to improve the interpretability of regression coefficients , 2010 .

[38]  Avrim Blum,et al.  On-line Algorithms in Machine Learning , 1996, Online Algorithms.

[39]  Samir Khuller,et al.  The Budgeted Maximum Coverage Problem , 1999, Inf. Process. Lett..

[40]  S. Wood Generalized Additive Models: An Introduction with R , 2006 .

[41]  Paulo J. Azevedo,et al.  Rules for contrast sets , 2010, Intell. Data Anal..

[42]  José Francisco Martínez Trinidad,et al.  A New Emerging Pattern Mining Algorithm and Its Application in Supervised Classification , 2010, PAKDD.

[43]  María José del Jesús,et al.  An overview on subgroup discovery: foundations and applications , 2011, Knowledge and Information Systems.

[44]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[45]  Tina L. Freiburger,et al.  The Effect of Race/Ethnicity on Sentencing: Examining Sentence Type, Jail Length, and Prison Length , 2015 .

[46]  Johannes Gehrke,et al.  Accurate intelligible models with pairwise interactions , 2013, KDD.

[47]  Vahab S. Mirrokni,et al.  Maximizing Non-Monotone Submodular Functions , 2011, 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS'07).

[48]  Nada Lavrac,et al.  Contrast Set Mining for Distinguishing Between Similar Diseases , 2007, AIME.

[49]  Albrecht Zimmermann,et al.  One in a million: picking the right patterns , 2008, Knowledge and Information Systems.

[50]  Jinyan Li,et al.  Efficient mining of emerging patterns: discovering trends and differences , 1999, KDD '99.

[51]  R. Tibshirani,et al.  Classification by Set Cover: The Prototype Vector Machine , 2009, 0908.2284.

[52]  Jianyong Wang,et al.  HARMONY: Efficiently Mining the Best Rules for Classification , 2005, SDM.

[53]  W Revelle,et al.  Very Simple Structure: An Alternative Procedure For Estimating The Optimal Number Of Interpretable Factors. , 1979, Multivariate behavioral research.

[54]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[55]  Cynthia Rudin,et al.  Bayesian Rule Sets for Interpretable Classification , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[56]  Jesús S. Aguilar-Ruiz,et al.  Searching for rules to detect defective modules: A subgroup discovery approach , 2012, Inf. Sci..

[57]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.