Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models

Understanding predictive models, in terms of interpreting and identifying actionable insights, is a challenging task. Often the importance of a feature in a model is only a rough estimate condensed into one number. However, our research goes beyond these naïve estimates through the design and implementation of an interactive visual analytics system, Prospector. By providing interactive partial dependence diagnostics, data scientists can understand how features affect the prediction overall. In addition, our support for localized inspection allows data scientists to understand how and why specific datapoints are predicted as they are, as well as support for tweaking feature values and seeing how the prediction responds. Our system is then evaluated using a case study involving a team of data scientists improving predictive models for detecting the onset of diabetes from electronic medical records.

[1]  Desney S. Tan,et al.  Using Multiple Models to Understand Data , 2011, IJCAI.

[2]  Julian D. Olden,et al.  Illuminating the “black box”: a randomization approach for understanding variable contributions in artificial neural networks , 2002 .

[3]  Vasant Honavar,et al.  Gaining insights into support vector machine pattern classifiers using projection-based tour methods , 2001, KDD '01.

[4]  T. J. Jankun-Kelly,et al.  Guided analysis of hurricane trends using statistical processes integrated with interactive parallel coordinates , 2009, 2009 IEEE Symposium on Visual Analytics Science and Technology.

[5]  Tony Plate,et al.  Visualizing the Function Computed by a Feedforward Neural Network , 2000, Neural Computation.

[6]  Leo Breiman,et al.  Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author) , 2001, Statistical Science.

[7]  Jimeng Sun,et al.  PARAMO: A PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records , 2014, J. Biomed. Informatics.

[8]  David Correa Martins,et al.  Signal propagation in Bayesian networks and its relationship with intrinsically multivariate predictive variables , 2013, Inf. Sci..

[9]  Anind K. Dey,et al.  Improving Understanding and Trust with Intelligibility in Context-Aware Applications , 2012 .

[10]  Penny Rheingans,et al.  Visualizing high-dimensional predictive model quality , 2000, Proceedings Visualization 2000. VIS 2000 (Cat. No.00CH37145).

[11]  Weng-Keen Wong,et al.  Principles of Explanatory Debugging to Personalize Interactive Machine Learning , 2015, IUI.

[12]  Carson Kai-Sang Leung,et al.  Sports Data Mining: Predicting Results for the College Football Games , 2014, KES.

[13]  Ben Shneiderman,et al.  Strategies for evaluating information visualization tools: multi-dimensional in-depth long-term case studies , 2006, BELIV '06.

[14]  John Ehrlinger ggRandomForests: Visually Exploring a Random Forest for Regression , 2015, 1501.07196.

[15]  Kwan-Liu Ma,et al.  Opening the black box - data driven visualization of neural networks , 2005, VIS 05. IEEE Visualization, 2005..

[16]  Paulo Cortez,et al.  Opening black box Data Mining models using Sensitivity Analysis , 2011, 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[17]  Emil Pitkin,et al.  Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation , 2013, 1309.6392.

[18]  Desney S. Tan,et al.  Interactive optimization for steering machine classification , 2010, CHI.

[19]  Ivan Bratko,et al.  Nomograms for visualizing support vector machines , 2005, KDD '05.

[20]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[21]  Paulo Cortez,et al.  Using sensitivity analysis and visualization techniques to open black box data mining models , 2013, Inf. Sci..

[22]  Ratul Mahajan,et al.  CueT: human-guided fast and accurate network alarm triage , 2011, CHI.

[23]  Kayur Patel,et al.  Scalable and Interpretable Data Representation for High-Dimensional, Complex Data , 2015, AAAI.

[24]  Penny Rheingans,et al.  Visualizing high-dimensional predictive model quality , 2000 .

[25]  Cynthia Rudin,et al.  The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification , 2014, NIPS.

[26]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[27]  Desney S. Tan,et al.  EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers , 2009, CHI.

[28]  David Maxwell Chickering,et al.  ModelTracker: Redesigning Performance Analysis Tools for Machine Learning , 2015, CHI.

[29]  T. J. Jankun-Kelly,et al.  Tropical Cyclone Trend Analysis Using Enhanced Parallel Coordinates and Statistical Analytics , 2009 .

[30]  Saleema Amershi,et al.  Designing for effective end-user interaction with machine learning , 2011, UIST '11 Adjunct.

[31]  Max Kuhn,et al.  Applied Predictive Modeling , 2013 .

[32]  Jarke J. van Wijk,et al.  BaobabView: Interactive construction and analysis of decision trees , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[33]  Enrico Bertini,et al.  Summaries: A special issue on Evaluation for Information Visualization , 2011, Inf. Vis..

[34]  Catherine Plaisant,et al.  The challenge of information visualization evaluation , 2004, AVI.

[35]  Eibe Frank,et al.  Visualizing Class Probability Estimators , 2003, PKDD.

[36]  Download Book,et al.  Information Visualization in Data Mining and Knowledge Discovery , 2001 .

[37]  Jana Diesner,et al.  Sentiment Analysis with Incremental Human-in-the-Loop Learning and Lexical Resource Customization , 2015, HT.

[38]  Enrico Bertini,et al.  INFUSE: Interactive Feature Selection for Predictive Modeling of High Dimensional Data , 2014, IEEE Transactions on Visualization and Computer Graphics.

[39]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[40]  James Fogarty,et al.  Regroup: interactive machine learning for on-demand group creation in social networks , 2012, CHI.

[41]  Aram Galstyan,et al.  Maximally Informative Hierarchical Representations of High-Dimensional Data , 2014, AISTATS.

[42]  Johannes Gehrke,et al.  Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission , 2015, KDD.

[43]  Ben Shneiderman,et al.  Integrating statistics and visualization: case studies of gaining clarity during exploratory data analysis , 2008, CHI.

[44]  Andreas Wierse,et al.  Information Visualization in Data Mining and Knowledge Discovery , 2001 .

[45]  Aram Galstyan,et al.  Discovering Structure in High-Dimensional Data Through Correlation Explanation , 2014, NIPS.

[46]  James A. Landay,et al.  Gestalt: integrated support for implementation and analysis in machine learning , 2010, UIST.

[47]  Finale Doshi-Velez,et al.  Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction , 2015, NIPS.