Building Interpretable Classifiers with Rules using Bayesian Analysis

We aim to produce predictive models that are not only accurate, but are also interpretable to human experts. Our models are decision lists, which consist of a series of if...then... statements (for example, if high blood pressure, then stroke) that discretize a high-dimensional, multivariate feature space into a series of simple, readily interpretable decision statements. We introduce a generative model called the Bayesian List Machine (BLM), which yields a posterior distribution over possible decision lists. It employs a novel prior structure to encourage sparsity. In terms of predictive accuracy, our experiments show that the Bayesian List Machine is on par with the current top algorithms for prediction in machine learning. Our method is motivated by recent developments in personalized medicine, and can be used to produce highly accurate and interpretable medical scoring systems. We demonstrate this by producing an alternative to the CHADS2 score, actively used in clinical practice for estimating the risk of stroke in patients that have atrial fibrillation. Our model is as interpretable as CHADS2, but more accurate. if total cholesterol ≥160 and smoke then 10 year CHD risk ≥ 5% else if smoke and systolic blood pressure≥140 then 10 year CHD risk ≥ 5% else 10 year CHD risk < 5% Figure 1: Example decision list created using the NHBLI Framingham Heart Study Coronary Heart Disease (CHD) inventory for a 45 year old male.

[1]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[2]  E. Draper,et al.  APACHE II: A severity of disease classification system , 1985, Critical care medicine.

[3]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[4]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[5]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[6]  金田 重郎,et al.  C4.5: Programs for Machine Learning (書評) , 1995 .

[7]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[8]  D. Levy,et al.  Prediction of coronary heart disease using risk factor categories. , 1998, Circulation.

[9]  H. Chipman,et al.  Bayesian CART Model Search , 1998 .

[10]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[11]  Adrian F. M. Smith,et al.  A Bayesian CART algorithm , 1998 .

[12]  Christophe Giraud-Carrier Beyond predictive accuracy : what? , 1998 .

[13]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[14]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[15]  E. Antman,et al.  The TIMI risk score for unstable angina/non–ST-elevation MI: a method for prognostication and therapeutic decision making , 2001 .

[16]  Cornelius T. Leondes,et al.  Expert systems : the technology of knowledge management and decision making for the 21st century , 2002 .

[17]  Nonparametric Convergence Assessment for MCMC Model Selection , 2003 .

[18]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[19]  W. Lim,et al.  Defining community acquired pneumonia severity on presentation to hospital: an international derivation and validation study , 2003, Thorax.

[20]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[21]  Edward I. George,et al.  Bayesian Treed Models , 2002, Machine Learning.

[22]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[23]  Eyke Hüllermeier,et al.  Learning Complexity-Bounded Rule-Based Classifiers by Combining. Association Analysis and Genetic Algorithms , 2005, EUSFLAT Conf..

[24]  Mario Marchand,et al.  Learning with Decision Lists of Data-Dependent Features , 2005, J. Mach. Learn. Res..

[25]  Christian Borgelt,et al.  An implementation of the FP-growth algorithm , 2005 .

[26]  M. Elter,et al.  The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. , 2007, Medical physics.

[27]  Michael A. West,et al.  Bayesian CART: Prior Specification and Posterior Simulation , 2007 .

[28]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[29]  Bogdan E. Popescu,et al.  PREDICTIVE LEARNING VIA RULE ENSEMBLES , 2008, 0811.1679.

[30]  N. Meinshausen Node harvest: simple and interpretable regression and classication , 2009, 0910.2145.

[31]  Gregory Y H Lip,et al.  Refining clinical risk stratification for predicting stroke and thromboembolism in atrial fibrillation using a novel risk factor-based approach: the euro heart survey on atrial fibrillation. , 2010, Chest.

[32]  J. Overhage,et al.  Advancing the Science for Active Surveillance: Rationale and Design for the Observational Medical Outcomes Partnership , 2010, Annals of Internal Medicine.

[33]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[34]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[35]  Galit Shmueli,et al.  To Explain or To Predict? , 2010, 1101.0891.

[36]  Cynthia Rudin,et al.  Sequential Event Prediction with Association Rules , 2011, COLT.

[37]  Cynthia Rudin,et al.  Bayesian Hierarchical Rule Modeling for Predicting Medical Conditions , 2012, 1206.6653.

[38]  Paulo J. G. Lisboa,et al.  Making machine learning models interpretable , 2012, ESANN.

[39]  Cynthia Rudin,et al.  An Integer Optimization Approach to Associative Classification , 2012, NIPS.