Generalized Linear Rule Models

This paper considers generalized linear models using rule-based features, also referred to as rule ensembles, for regression and probabilistic classification. Rules facilitate model interpretation while also capturing nonlinear dependences and interactions. Our problem formulation accordingly trades off rule set complexity and prediction accuracy. Column generation is used to optimize over an exponentially large space of rules without pre-generating a large subset of candidates or greedily boosting rules one by one. The column generation subproblem is solved using either integer programming or a heuristic optimizing the same objective. In experiments involving logistic and linear regression, the proposed methods obtain better accuracy-complexity trade-offs than existing rule ensemble algorithms. At one end of the trade-off, the methods are competitive with less interpretable benchmark models.

[1]  Margo I. Seltzer,et al.  Learning Certifiably Optimal Rule Lists , 2017, KDD.

[2]  Yoram Singer,et al.  A simple, fast, and effective rule learner , 1999, AAAI 1999.

[3]  Gérard Cornuéjols,et al.  Integer programming , 2014, Math. Program..

[4]  Michael Patriksson,et al.  A class of column generation/simplicial decomposition methods in convex differentiable optimization, I: Convergence analysis , 2003 .

[5]  Sholom M. Weiss,et al.  Lightweight Rule Induction , 2000, ICML.

[6]  Kush R. Varshney,et al.  Learning sparse two-level boolean rules , 2016, 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP).

[7]  Ralf Borndörfer,et al.  Rapid branching , 2013, Public Transp..

[8]  R. Gomory,et al.  A Linear Programming Approach to the Cutting-Stock Problem , 1961 .

[9]  Jure Leskovec,et al.  Interpretable Decision Sets: A Joint Framework for Description and Prediction , 2016, KDD.

[10]  Peter A. Flach,et al.  A Unified View of Performance Metrics: Translating Threshold Choice into Expected Classification Loss C` Esar Ferri , 2012 .

[11]  Johannes Fürnkranz,et al.  Foundations of Rule Learning , 2012, Cognitive Technologies.

[12]  Ayhan Demiriz,et al.  Linear Programming Boosting via Column Generation , 2002, Machine Learning.

[13]  Margo I. Seltzer,et al.  Scalable Bayesian Rule Lists , 2016, ICML.

[14]  Cynthia Rudin,et al.  A Bayesian Framework for Learning Rule Sets for Interpretable Classification , 2017, J. Mach. Learn. Res..

[15]  Michael Patriksson,et al.  Column Generation Algorithms for Nonlinear Optimization, I: Convergence Analysis , 2003 .

[16]  Sanjeeb Dash,et al.  Boolean Decision Rules via Column Generation , 2018, NeurIPS.

[17]  R. Tibshirani,et al.  Generalized Additive Models , 1986 .

[18]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[19]  Bogdan E. Popescu,et al.  PREDICTIVE LEARNING VIA RULE ENSEMBLES , 2008, 0811.1679.

[20]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[21]  Ganesh Ramakrishnan,et al.  Efficient Rule Ensemble Learning using Hierarchical Kernels , 2011, ICML.

[22]  Cynthia Rudin,et al.  Learning customized and optimized lists of rules with mathematical programming , 2018, Math. Program. Comput..

[23]  Peter Clark,et al.  Rule Induction with CN2: Some Recent Improvements , 1991, EWSL.

[24]  Ulrich Rückert,et al.  A statistical approach to rule learning , 2006, ICML.

[25]  Wojciech Kotlowski,et al.  ENDER: a statistical framework for boosting decision rules , 2010, Data Mining and Knowledge Discovery.

[26]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[27]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[28]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.