ProtoDash: Fast Interpretable Prototype Selection

In this paper we propose an efficient algorithm ProtoDash for selecting prototypical examples from complex datasets. Our work builds on top of the learn to criticize (L2C) work by Kim et al. (2016) and generalizes it to not only select prototypes for a given sparsity level $m$ but also to associate non-negative weights with each of them indicative of the importance of each prototype. Unlike in the case of L2C, this extension provides a single coherent framework under which both prototypes and criticisms (i.e. lowest weighted prototypes) can be found. Furthermore, our framework works for any symmetric positive definite kernel thus addressing one of the open questions laid out in Kim et al. (2016). Our additional requirement of learning non-negative weights introduces technical challenges as the objective is no longer submodular as in the previous work. However, we show that the problem is weakly submodular and derive approximation guarantees for our fast ProtoDash algorithm. Moreover, ProtoDash can not only find prototypical examples for a dataset $X$, but it can also find (weighted) prototypical examples from $X^{(2)}$ that best represent another dataset $X^{(1)}$, where $X^{(1)}$ and $X^{(2)}$ belong to the same feature space. We demonstrate the efficacy of our method on diverse domains namely; retail, digit recognition (MNIST) and on the latest publicly available 40 health questionnaires obtained from the Center for Disease Control (CDC) website maintained by the US Dept. of Health. We validate the results quantitatively as well as qualitatively based on expert feedback and recently published scientific studies on public health.

[1]  Mark Weiser,et al.  Programmers use slices when debugging , 1982, CACM.

[2]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[3]  Alexandros G. Dimakis,et al.  Restricted Strong Convexity Implies Weak Submodularity , 2016, The Annals of Statistics.

[4]  R. Tibshirani,et al.  Prototype selection for interpretable classification , 2011, 1202.5933.

[5]  Alexander J. Smola,et al.  Linear-Time Estimators for Propensity Scores , 2011, AISTATS.

[6]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[7]  Kush R. Varshney,et al.  Interpretable Two-level Boolean Rule Learning for Classification , 2015, ArXiv.

[8]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[9]  Alexandros G. Dimakis,et al.  Scalable Greedy Feature Selection via Weak Submodularity , 2017, AISTATS.

[10]  Amit Dhurandhar,et al.  Supervised item response models for informative prediction , 2016, Knowledge and Information Systems.

[11]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[12]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[13]  Oluwasanmi Koyejo,et al.  Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[14]  Hua Zhou,et al.  Algorithms for Fitting the Constrained Lasso , 2016, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[15]  Abhimanyu Das,et al.  Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection , 2011, ICML.

[16]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[17]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[18]  Cynthia Rudin,et al.  Falling Rule Lists , 2014, AISTATS.

[19]  Satoru Fujishige,et al.  Submodular functions and optimization , 1991 .