The fused lasso penalty for learning interpretable medical scoring systems

Score learning aims at taking advantage of supervised learning to estimate interpretable models which facilitate decision making. Ideally, a scoring system is based on simple arithmetic operations, is sparse, and can be easily explained by human experts. In this contribution, we introduce an original methodology to simultaneously learn interpretable binning mapped to a class variable, and the weights associated with these bins contributing to the score. We show by numerical experiments on benchmark data sets that our approach is competitive compared to the state-of-the-art methods. We illustrate by a real medical problem of type 2 diabetes remission prediction that a scoring system learned automatically is comparable to one manually constructed by clinicians.

[1]  Jon Gabrielsen,et al.  Preoperative prediction of type 2 diabetes remission after Roux-en-Y gastric bypass surgery: a retrospective cohort study. , 2014, The lancet. Diabetes & endocrinology.

[2]  Cynthia Rudin,et al.  Falling Rule Lists , 2014, AISTATS.

[3]  Prabhakar Raghavan,et al.  Randomized rounding: A technique for provably good algorithms and algorithmic proofs , 1985, Comb..

[4]  Cynthia Rudin,et al.  A Bayesian Approach to Learning Scoring Systems , 2015, Big Data.

[5]  Cynthia Rudin,et al.  On combining machine learning with decision making , 2011, Machine Learning.

[6]  Cynthia Rudin,et al.  Supersparse linear integer models for optimized medical scoring systems , 2015, Machine Learning.

[7]  E F Cook,et al.  Empiric comparison of multivariate analytic techniques: advantages and disadvantages of recursive partitioning analysis. , 1984, Journal of chronic diseases.

[8]  Thomas Higgins,et al.  SAPS 3--From evaluation of the patient to evaluation of the intensive care unit. Part 2: Development of a prognostic model for hospital mortality at ICU admission. , 2005 .

[9]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[10]  Martha J. Radford,et al.  Validation of Clinical Classification Schemes for Predicting Stroke: Results From the National Registry of Atrial Fibrillation , 2001 .

[11]  C. Rudin,et al.  Building Interpretable Classifiers with Rules using Bayesian Analysis , 2012 .

[12]  S. Lemeshow,et al.  A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study. , 1993, JAMA.

[13]  Roberta F. White,et al.  Repeated split sample validation to assess logistic regression and recursive partitioning: an application to the prediction of cognitive impairment , 2005, Statistics in medicine.

[14]  D. E. Lawrence,et al.  APACHE—acute physiology and chronic health evaluation: a physiologically based classification system , 1981, Critical care medicine.

[15]  ipred : Improved Predictors , 2009 .

[16]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[17]  M. Elter,et al.  The prediction of breast cancer biopsy outcomes using two CAD approaches that both emphasize an intelligible decision process. , 2007, Medical physics.

[18]  K. Hornik,et al.  Unbiased Recursive Partitioning: A Conditional Inference Framework , 2006 .

[19]  Cynthia Rudin,et al.  Interpretable classifiers using rules and Bayesian analysis: Building a better stroke prediction model , 2015, ArXiv.

[20]  William Nick Street,et al.  An Inductive Learning Approach to Prognostic Prediction , 1995, ICML.

[21]  Yann Chevaleyre,et al.  Rounding Methods for Discrete Linear Classification , 2013, ICML.

[22]  Judith Aron-Wisnewsky,et al.  Type 2 Diabetes Remission After Gastric Bypass: What Is the Best Prediction Tool for Clinicians? , 2015, Obesity Surgery.

[23]  Suchi Saria,et al.  Learning (predictive) risk scores in the presence of censoring due to interventions , 2015, Machine Learning.

[24]  D. Sculley,et al.  Large-Scale Learning with Less RAM via Randomization , 2013, ICML.

[25]  E. Antman,et al.  The TIMI risk score for unstable angina/non-ST elevation MI: A method for prognostication and therapeutic decision making. , 2000, JAMA.

[26]  Marc Pirlot,et al.  Learning the Parameters of a Non Compensatory Sorting Model , 2015, ADT.