Efficient Data Representation by Selecting Prototypes with Importance Weights

Prototypical examples that best summarize and compactly represent an underlying complex data distribution, communicate meaningful insights to humans in domains where simple explanations are hard to extract. In this paper, we present algorithms with strong theoretical guarantees to mine these data sets and select prototypes, a.k.a. representatives that optimally describes them. Our work notably generalizes the recent work by Kim et al. (2016) where in addition to selecting prototypes, we also associate non-negative weights which are indicative of their importance. This extension provides a single coherent framework under which both prototypes and criticisms (i.e. outliers) can be found. Furthermore, our framework works for any symmetric positive definite kernel thus addressing one of the key open questions laid out in Kim et al. (2016). By establishing that our objective function enjoys a key property of that of weak submodularity, we present a fast ProtoDash algorithm and also derive approximation guarantees for the same. We demonstrate the efficacy of our method on diverse domains such as retail, digit recognition (MNIST) and on publicly available 40 health questionnaires obtained from the Center for Disease Control (CDC) website maintained by the US Dept. of Health. We validate the results quantitatively as well as qualitatively based on expert feedback and recently published scientific studies on public health, thus showcasing the power of our technique in providing actionability (for retail), utility (for MNIST), and insight (on CDC datasets), which arguably are the hallmarks of an effective interpretable machine learning method.

[1]  H. Sebastian Seung,et al.  Algorithms for Non-negative Matrix Factorization , 2000, NIPS.

[2]  P. Stark Bounded-Variable Least-Squares: an Algorithm and Applications , 2008 .

[3]  Mark Weiser,et al.  Programmers use slices when debugging , 1982, CACM.

[4]  Martin Grötschel,et al.  Mathematical Programming The State of the Art, XIth International Symposium on Mathematical Programming, Bonn, Germany, August 23-27, 1982 , 1983, ISMP.

[5]  R. Tibshirani,et al.  Prototype selection for interpretable classification , 2011, 1202.5933.

[6]  Hua Zhou,et al.  Algorithms for Fitting the Constrained Lasso , 2016, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[7]  Cynthia Rudin,et al.  Falling Rule Lists , 2014, AISTATS.

[8]  Davide Anguita,et al.  Tikhonov, Ivanov and Morozov regularization for support vector machine learning , 2015, Machine Learning.

[9]  Abhimanyu Das,et al.  Submodular meets Spectral: Greedy Algorithms for Subset Selection, Sparse Approximation and Dictionary Selection , 2011, ICML.

[10]  Alexandros G. Dimakis,et al.  Scalable Greedy Feature Selection via Weak Submodularity , 2017, AISTATS.

[11]  Yoram Singer,et al.  Support Vector Machines on a Budget , 2006, NIPS.

[12]  V. Ivanov,et al.  The Theory of Approximate Methods and Their Application to the Numerical Solution of Singular Integr , 1978 .

[13]  Alexander J. Smola,et al.  Linear-Time Estimators for Propensity Scores , 2011, AISTATS.

[14]  Oluwasanmi Koyejo,et al.  Examples are not enough, learn to criticize! Criticism for Interpretability , 2016, NIPS.

[15]  Percy Liang,et al.  Understanding Black-box Predictions via Influence Functions , 2017, ICML.

[16]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[17]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[18]  Alexandros G. Dimakis,et al.  Restricted Strong Convexity Implies Weak Submodularity , 2016, The Annals of Statistics.

[19]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[20]  Amit Dhurandhar,et al.  Supervised item response models for informative prediction , 2016, Knowledge and Information Systems.

[21]  Kush R. Varshney,et al.  Interpretable Two-level Boolean Rule Learning for Classification , 2015, ArXiv.

[22]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..