Learning Certifiably Optimal Rule Lists: A Case for Discrete Optimization in the 21st Century

We demonstrate a new algorithm, CORELS, for constructing rule lists. It finds the optimal rule list and produces proof of that optimality. Rule lists, which are lists composed of if-then statements, are similar to decision trees and are useful because each step in the model’s decision making process is understandable by humans. CORELS uses the discrete optimization technique of branch-and-bound to eliminate large parts of the search space and turn this into a computationally feasible problem. We use three types of bounds: bounds inherent to the rules themselves, bounds based on the current best solution, and bounds based on symmetries between rule lists. In addition, we use efficient data structures to minimize the memory usage and runtime of our algorithm on this exponentially difficult problem. Our algorithm demonstrates the feasibility of finding optimal solutions in a search space using discrete optimization on modern computers. Our algorithm therefore allows for the discovery and analysis of optimal solutions to problems requiring human-interpretable algorithms.

[1]  A. Land,et al.  An Automatic Method for Solving Discrete Programming Problems , 1960, 50 Years of Integer Programming.

[2]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[3]  Martin W. P. Savelsbergh,et al.  A Computational Study of Search Strategies for Mixed Integer Programming , 1999, INFORMS J. Comput..

[4]  D. Bertsimas,et al.  Best Subset Selection via a Modern Optimization Lens , 2015, 1507.03133.

[5]  Bart Preneel,et al.  Hash functions , 2005, Encyclopedia of Cryptography and Security.

[6]  Nada Lavrac,et al.  Selected techniques for data mining in medicine , 1999, Artif. Intell. Medicine.

[7]  Richard Nock,et al.  On Learning Decision Committees , 1995, ICML.

[8]  R. Rivest Learning Decision Lists , 1987, Machine Learning.

[9]  Avrim Blum Learning boolean functions in an infinite attribute space , 1990, STOC '90.

[10]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[11]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[12]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[13]  Ivan Bratko,et al.  Machine Learning: Between Accuracy and Interpretability , 1997 .

[14]  K. G. Murty,et al.  An Algorithm for the Traveling Salesman Problem , 2019 .

[15]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[16]  Justin M. Rao,et al.  Precinct or Prejudice? Understanding Racial Disparities in New York City's Stop-and-Frisk Policy , 2016 .

[17]  Bernard M. E. Moret,et al.  Decision Trees and Diagrams , 1982, CSUR.

[18]  Rocco A. Servedio,et al.  Toward Attribute Efficient Learning of Decision Lists and Parities , 2006, J. Mach. Learn. Res..

[19]  Margo I. Seltzer,et al.  Learning Certifiably Optimal Rule Lists , 2017, KDD.

[20]  Hang Li,et al.  Text classification using ESC-based stochastic decision lists , 2002, Inf. Process. Manag..

[21]  Lisa Hellerstein,et al.  PAC learning with irrelevant attributes , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[22]  Bart Baesens,et al.  Building Intelligent Credit Scoring Systems Using Decision Tables , 2003, ICEIS.

[23]  Marie Davidian,et al.  Using decision lists to construct interpretable and parsimonious treatment regimes , 2015, Biometrics.

[24]  Christophe Croux,et al.  Bagging and Boosting Classification Trees to Predict Churn , 2006 .

[25]  Margo I. Seltzer,et al.  Scalable Bayesian Rule Lists , 2016, ICML.

[26]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[27]  Ronald L. Rivest,et al.  Constructing Optimal Binary Decision Trees is NP-Complete , 1976, Inf. Process. Lett..

[28]  Bart Baesens,et al.  Performance of classification models from a user perspective , 2011, Decis. Support Syst..

[29]  Alex Alves Freitas,et al.  Comprehensible classification models: a position paper , 2014, SKDD.

[30]  Toshihide Ibaraki,et al.  Decision lists and related Boolean functions , 2002, Theor. Comput. Sci..

[31]  T. Brennan,et al.  Evaluating the Predictive Validity of the Compas Risk and Needs Assessment System , 2009 .

[32]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Computing k-Nearest Neighbors , 1975, IEEE Transactions on Computers.

[33]  Stefan Rüping,et al.  Learning interpretable models , 2006 .

[34]  P. Kolesar A Branch and Bound Algorithm for the Knapsack Problem , 1967 .

[35]  Oren Etzioni,et al.  Learning Decision Lists Using Homogeneous Rules , 1994, AAAI.

[36]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[37]  Kyuseok Shim,et al.  Efficient algorithms for constructing decision trees with constraints , 2000, KDD '00.

[38]  Keinosuke Fukunaga,et al.  A Branch and Bound Algorithm for Feature Subset Selection , 1977, IEEE Transactions on Computers.