A Bayesian Framework for Learning Rule Sets for Interpretable Classification

We present a machine learning algorithm for building classifiers that are comprised of a small number of short rules. These are restricted disjunctive normal form models. An example of a classifier of this form is as follows: If X satisfies (condition A AND condition B) OR (condition C) OR ..., then Y = 1. Models of this form have the advantage of being interpretable to human experts since they produce a set of rules that concisely describe a specific class. We present two probabilistic models with prior parameters that the user can set to encourage the model to have a desired size and shape, to conform with a domain-specific definition of interpretability. We provide a scalable MAP inference approach and develop theoretical bounds to reduce computation by iteratively pruning the search space. We apply our method (Bayesian Rule Sets - BRS) to characterize and predict user behavior with respect to in-vehicle context-aware personalized recommender systems. Our method has a major advantage over classical associative classification methods and decision trees in that it does not greedily grow the model.

[1]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.

[2]  Xindong Wu,et al.  Mining Both Positive and Negative Association Rules , 2002, ICML.

[3]  Stefan Rüping,et al.  Learning interpretable models , 2006 .

[4]  Margo I. Seltzer,et al.  Learning Certifiably Optimal Rule Lists , 2017, KDD.

[5]  Ryszard S. Michalski,et al.  On the Quasi-Minimal Solution of the General Covering Problem , 1969 .

[6]  Bart Baesens,et al.  Performance of classification models from a user perspective , 2011, Decis. Support Syst..

[7]  Alex Alves Freitas,et al.  Comprehensible classification models: a position paper , 2014, SKDD.

[8]  Gediminas Adomavicius,et al.  Context-aware recommender systems , 2008, RecSys '08.

[9]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[10]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[11]  David S. Rosenblum,et al.  Context-aware mobile music recommendation for daily activities , 2012, ACM Multimedia.

[12]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[13]  Yimin Liu,et al.  Or's of And's for Interpretable Classification, with Application to Context-Aware Recommender Systems , 2015, ArXiv.

[14]  Bart Baesens,et al.  An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models , 2011, Decis. Support Syst..

[15]  Luis Martínez-López,et al.  A mobile 3D-GIS hybrid recommender system for tourism , 2012, Inf. Sci..

[16]  Ian H. Witten,et al.  Generating Accurate Rule Sets Without Global Optimization , 1998, ICML.

[17]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[18]  Kush R. Varshney,et al.  Exact Rule Learning via Boolean Compressed Sensing , 2013, ICML.

[19]  Erik Duval,et al.  Context-Aware Recommender Systems for Learning: A Survey and Future Challenges , 2012, IEEE Transactions on Learning Technologies.

[20]  Richard S. Sutton,et al.  Introduction to Reinforcement Learning , 1998 .

[21]  Cynthia Rudin,et al.  Bayesian Rule Sets for Interpretable Classification , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[22]  Nicholas I. Fisher,et al.  Bump hunting in high-dimensional data , 1999, Stat. Comput..

[23]  Niklas Lavesson,et al.  User-oriented Assessment of Classification Model Understandability , 2011, SCAI.

[24]  Toshihide Ibaraki,et al.  An Implementation of Logical Analysis of Data , 2000, IEEE Trans. Knowl. Data Eng..

[25]  Jinyan Li,et al.  CAEP: Classification by Aggregating Emerging Patterns , 1999, Discovery Science.

[26]  Robert E. Schapire,et al.  Using output codes to boost multiclass learning problems , 1997, ICML.

[27]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[28]  Vitaly Feldman Learning DNF Expressions from Fourier Spectrum , 2012, COLT.

[29]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[30]  Bernd Ludwig,et al.  InCarMusic: Context-Aware Music Recommendations in a Car , 2011, EC-Web.

[31]  Matthias Baldauf,et al.  A survey on context-aware systems , 2007, Int. J. Ad Hoc Ubiquitous Comput..

[32]  Sung-Bae Cho,et al.  Location-Based Recommendation System Using Bayesian User's Preference Model in Mobile Devices , 2007, UIC.

[33]  Ming-Syan Chen,et al.  On the mining of substitution rules for statistically dependent items , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[34]  Brian R. Gaines,et al.  Induction of ripple-down rules applied to modeling large databases , 1995, Journal of Intelligent Information Systems.

[35]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 2005, IEEE Transactions on Neural Networks.

[36]  Jan A. Kors,et al.  Finding a short and accurate decision rule in disjunctive normal form by exhaustive search , 2010, Machine Learning.

[37]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[38]  Geoffrey I. Webb OPUS: An Efficient Admissible Algorithm for Unordered Search , 1995, J. Artif. Intell. Res..

[39]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[40]  Cynthia Rudin,et al.  A Hierarchical Model for Association Rule Mining of Sequential Events: An Approach to Automated Medical Symptom Prediction , 2011 .

[41]  Cynthia Rudin,et al.  Learning theory analysis for association rules and sequential event prediction , 2013, J. Mach. Learn. Res..

[42]  Von-Wun Soo,et al.  A personalized restaurant recommender agent for mobile e-service , 2004, IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004.

[43]  Margo I. Seltzer,et al.  Learning Certifiably Optimal Rule Lists , 2017, KDD.

[44]  Cynthia Rudin,et al.  Box drawings for learning with imbalanced data , 2014, KDD.

[45]  B. Pröll,et al.  Context-awareness in Mobile Tourism Guides – A Comprehensive Survey , 2005 .

[46]  C. Hwang Simulated annealing: Theory and applications , 1988, Acta Applicandae Mathematicae - An International Survey Journal on Applying Mathematics and Mathematical Applications.

[47]  Y. Crama,et al.  Cause-effect relationships and partially defined Boolean functions , 1988 .

[48]  Christian Borgelt,et al.  An implementation of the FP-growth algorithm , 2005 .

[49]  Johan Koolwaaij,et al.  Context-Aware Recommendations in the Mobile Tourist Application COMPASS , 2004, AH.

[50]  Xing Zhang,et al.  A new approach to classification based on association rule mining , 2006, Decis. Support Syst..

[51]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[52]  Bart Baesens,et al.  Building Acceptable Classification Models , 2010, Data Mining.

[53]  GeunSik Jo,et al.  Location-Based Service with Context Data for a Restaurant Recommendation , 2006, DEXA.

[54]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[55]  U. Ruttimann,et al.  Pediatric risk of mortality (PRISM) score. , 1988, Critical care medicine.

[56]  Luca Cagliero,et al.  CAS-Mine: providing personalized services in context-aware applications by means of generalized rules , 2010, Knowledge and Information Systems.

[57]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[58]  Rocco A. Servedio,et al.  Learning DNF in time , 2001, STOC '01.

[59]  Cynthia Rudin,et al.  Falling Rule Lists , 2014, AISTATS.

[60]  T. Evgeniou,et al.  Disjunctions of Conjunctions, Cognitive Simplicity, and Consideration Sets , 2010 .

[61]  Vitaly Feldman Hardness of approximate two-level logic minimization and PAC learning with membership queries , 2009, J. Comput. Syst. Sci..