Learning Optimized Or's of And's

Or's of And's (OA) models are comprised of a small number of disjunctions of conjunctions, also called disjunctive normal form. An example of an OA model is as follows: If ($x_1 = $ `blue' AND $x_2=$ `middle') OR ($x_1 = $ `yellow'), then predict $Y=1$, else predict $Y=0$. Or's of And's models have the advantage of being interpretable to human experts, since they are a set of conditions that concisely capture the characteristics of a specific subset of data. We present two optimization-based machine learning frameworks for constructing OA models, Optimized OA (OOA) and its faster version, Optimized OA with Approximations (OOAx). We prove theoretical bounds on the properties of patterns in an OA model. We build OA models as a diagnostic screening tool for obstructive sleep apnea, that achieves high accuracy with a substantial gain in interpretability over other methods.

[1]  Jiawei Han,et al.  CPAR: Classification based on Predictive Association Rules , 2003, SDM.

[2]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[3]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[4]  Cynthia Rudin,et al.  Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[5]  Geoffrey P. Goodwin,et al.  Logic, probability, and human reasoning , 2015, Trends in Cognitive Sciences.

[6]  Wynne Hsu,et al.  Integrating Classification and Association Rule Mining , 1998, KDD.

[7]  Ariel D. Procaccia,et al.  Exact VC-Dimension of Monotone Formulas , 2006 .

[8]  Simon Price,et al.  Inductive Logic Programming , 2000, Lecture Notes in Computer Science.

[9]  Jian Pei,et al.  CMAR: accurate and efficient classification based on multiple class-association rules , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[10]  Jiawei Han,et al.  Discriminative Frequent Pattern Analysis for Effective Classification , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[11]  Srinivasan Parthasarathy,et al.  New Algorithms for Fast Discovery of Association Rules , 1997, KDD.

[12]  Christian Borgelt,et al.  An implementation of the FP-growth algorithm , 2005 .

[13]  Rocco A. Servedio,et al.  Learning DNF in time 2Õ(n1/3) , 2004, J. Comput. Syst. Sci..

[14]  Xing Zhang,et al.  A new approach to classification based on association rule mining , 2006, Decis. Support Syst..

[15]  Jian Pei,et al.  Mining frequent patterns without candidate generation , 2000, SIGMOD '00.

[16]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[17]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[18]  Greg M. Allenby,et al.  A Choice Model with Conjunctive, Disjunctive, and Compensatory Screening Rules , 2004 .

[19]  Yimin Liu,et al.  Or's of And's for Interpretable Classification, with Application to Context-Aware Recommender Systems , 2015, ArXiv.

[20]  C. Rudin,et al.  Clinical Prediction Models for Sleep Apnea: The Importance of Medical History over Symptoms. , 2016, Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine.

[21]  R. Dawes Judgment under uncertainty: The robust beauty of improper linear models in decision making , 1979 .

[22]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[23]  T. Evgeniou,et al.  Disjunctions of Conjunctions, Cognitive Simplicity, and Consideration Sets , 2010 .

[24]  Ramakrishnan Srikant,et al.  Fast algorithms for mining association rules , 1998, VLDB 1998.

[25]  Adam R. Klivans,et al.  Learning DNF in time 2 Õ(n 1/3 ) . , 2001, STOC 2001.