Or's of And's for Interpretable Classification, with Application to Context-Aware Recommender Systems

We present a machine learning algorithm for building classifiers that are comprised of a small number of disjunctions of conjunctions (or's of and's). An example of a classifier of this form is as follows: If X satisfies (x1 = 'blue' AND x3 = 'middle') OR (x1 = 'blue' AND x2 = '<15') OR (x1 = 'yellow'), then we predict that Y=1, ELSE predict Y=0. An attribute-value pair is called a literal and a conjunction of literals is called a pattern. Models of this form have the advantage of being interpretable to human experts, since they produce a set of conditions that concisely describe a specific class. We present two probabilistic models for forming a pattern set, one with a Beta-Binomial prior, and the other with Poisson priors. In both cases, there are prior parameters that the user can set to encourage the model to have a desired size and shape, to conform with a domain-specific definition of interpretability. We provide two scalable MAP inference approaches: a pattern level search, which involves association rule mining, and a literal level search. We show stronger priors reduce computation. We apply the Bayesian Or's of And's (BOA) model to predict user behavior with respect to in-vehicle context-aware personalized recommender systems.

[1]  Christian Borgelt,et al.  An implementation of the FP-growth algorithm , 2005 .

[2]  Cynthia Rudin,et al.  An Interpretable Stroke Prediction Model using Rules and Bayesian Analysis , 2013, AAAI.

[3]  Yann Chevaleyre,et al.  Rounding Methods for Discrete Linear Classification , 2013, ICML.

[4]  V. Kostov,et al.  Travel destination prediction using frequent crossing pattern from driving history , 2005, Proceedings. 2005 IEEE Intelligent Transportation Systems, 2005..

[5]  Kush R. Varshney,et al.  Exact Rule Learning via Boolean Compressed Sensing , 2013, ICML.

[6]  J. Muellbauer,et al.  Economics and consumer behavior , 1980 .

[7]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[8]  Cynthia Rudin,et al.  Methods and Models for Interpretable Linear Classification , 2014, ArXiv.

[9]  Johan Koolwaaij,et al.  Context-Aware Recommendations in the Mobile Tourist Application COMPASS , 2004, AH.

[10]  Rocco A. Servedio,et al.  Learning DNF in time 2Õ(n1/3) , 2004, J. Comput. Syst. Sci..

[11]  Xing Zhang,et al.  A new approach to classification based on association rule mining , 2006, Decis. Support Syst..

[12]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[13]  Bart Baesens,et al.  An empirical evaluation of the comprehensibility of decision table, tree and rule based predictive models , 2011, Decis. Support Syst..

[14]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[15]  Bart Baesens,et al.  Building Acceptable Classification Models , 2010, Data Mining.

[16]  GeunSik Jo,et al.  Location-Based Service with Context Data for a Restaurant Recommendation , 2006, DEXA.

[17]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[18]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[19]  Cynthia Rudin,et al.  Box drawings for learning with imbalanced data , 2014, KDD.

[20]  이동석,et al.  Music selecting system and method thereof , 2008 .

[21]  Bob E. Hayes,et al.  Measuring Customer Satisfaction and Loyalty: Survey Design, Use, and Statistical Analysis Methods , 2008 .

[22]  Jude W. Shavlik,et al.  Extracting refined rules from knowledge-based neural networks , 2004, Machine Learning.

[23]  Scott Kirkpatrick,et al.  Optimization by Simmulated Annealing , 1983, Sci..

[24]  Bill Anckar,et al.  VALUE CREATION IN MOBILE COMMERCE: FINDINGS FROM A CONSUMER SURVEY , 2002 .

[25]  Shamkant B. Navathe,et al.  Mining for strong negative associations in a large database of customer transactions , 1998, Proceedings 14th International Conference on Data Engineering.

[26]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[27]  L. Beach,et al.  Testing the Compatibility Test: How Instructions, Accountability, and Anticipated Regret Affect Prechoice Screening of Options. , 1999, Organizational behavior and human decision processes.

[28]  Bernd Ludwig,et al.  InCarMusic: Context-Aware Music Recommendations in a Car , 2011, EC-Web.

[29]  Matthias Baldauf,et al.  A survey on context-aware systems , 2007, Int. J. Ad Hoc Ubiquitous Comput..

[30]  Xindong Wu,et al.  Mining Both Positive and Negative Association Rules , 2002, ICML.

[31]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[32]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[33]  Nicholas I. Fisher,et al.  Bump hunting in high-dimensional data , 1999, Stat. Comput..

[34]  Niklas Lavesson,et al.  User-oriented Assessment of Classification Model Understandability , 2011, SCAI.

[35]  Sung-Bae Cho,et al.  Location-Based Recommendation System Using Bayesian User's Preference Model in Mobile Devices , 2007, UIC.

[36]  Cynthia Rudin,et al.  Falling Rule Lists , 2014, AISTATS.

[37]  Cynthia Rudin,et al.  A Hierarchical Model for Association Rule Mining of Sequential Events: An Approach to Automated Medical Symptom Prediction , 2011 .

[38]  T. Evgeniou,et al.  Disjunctions of Conjunctions, Cognitive Simplicity, and Consideration Sets , 2010 .

[39]  Adam R. Klivans,et al.  Learning DNF in time 2 Õ(n 1/3 ) . , 2001, STOC 2001.

[40]  Rajeev Motwani,et al.  Beyond market baskets: generalizing association rules to correlations , 1997, SIGMOD '97.

[41]  Cynthia Rudin,et al.  Learning theory analysis for association rules and sequential event prediction , 2013, J. Mach. Learn. Res..

[42]  Vitaly Feldman Learning DNF Expressions from Fourier Spectrum , 2012, COLT.

[43]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[44]  Stefan Rüping,et al.  Learning interpretable models , 2006 .

[45]  Erik Duval,et al.  Context-Aware Recommender Systems for Learning: A Survey and Future Challenges , 2012, IEEE Transactions on Learning Technologies.

[46]  B. Pröll,et al.  Context-awareness in Mobile Tourism Guides – A Comprehensive Survey , 2005 .

[47]  佩里·罗宾逊·麦克尼尔,et al.  Method and apparatus for advertisement screening , 2014 .

[48]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[49]  Cynthia Rudin,et al.  Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[50]  David S. Rosenblum,et al.  Context-aware mobile music recommendation for daily activities , 2012, ACM Multimedia.

[51]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[52]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[53]  Von-Wun Soo,et al.  A personalized restaurant recommender agent for mobile e-service , 2004, IEEE International Conference on e-Technology, e-Commerce and e-Service, 2004. EEE '04. 2004.

[54]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[55]  U. Ruttimann,et al.  Pediatric risk of mortality (PRISM) score. , 1988, Critical care medicine.

[56]  Luca Cagliero,et al.  CAS-Mine: providing personalized services in context-aware applications by means of generalized rules , 2010, Knowledge and Information Systems.

[57]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[58]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[59]  Luis Martínez-López,et al.  A mobile 3D-GIS hybrid recommender system for tourism , 2012, Inf. Sci..

[60]  Ming-Syan Chen,et al.  On the mining of substitution rules for statistically dependent items , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[61]  Brian R. Gaines,et al.  Induction of ripple-down rules applied to modeling large databases , 1995, Journal of Intelligent Information Systems.

[62]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[63]  Bart Baesens,et al.  Performance of classification models from a user perspective , 2011, Decis. Support Syst..

[64]  Alex Alves Freitas,et al.  Comprehensible classification models: a position paper , 2014, SKDD.

[65]  Gediminas Adomavicius,et al.  Context-aware recommender systems , 2008, RecSys '08.

[66]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.