StrategyAtlas: Strategy Analysis for Machine Learning Interpretability

Businesses in high-risk environments have been reluctant to adopt modern machine learning approaches due to their complex and uninterpretable nature. Most current solutions provide local, instance-level explanations, but this is insufficient for understanding the model as a whole. In this work, we show that strategy clusters (i.e., groups of data instances that are treated distinctly by the model) can be used to understand the global behavior of a complex ML model. To support effective exploration and understanding of these clusters, we introduce StrategyAtlas, a system designed to analyze and explain model strategies. Furthermore, it supports multiple ways to utilize these strategies for simplifying and improving the reference model. In collaboration with a large insurance company, we present a use case in automatic insurance acceptance, and show how professional data scientists were enabled to understand a complex model and improve the production model based on these insights.

[1]  Luis Gustavo Nonato,et al.  Melody: Generating and Visualizing Machine Learning Model Summary to Understand Data and Classifiers Together , 2020, ArXiv.

[2]  Jun Yuan,et al.  SUBPLEX: Towards a Better Understanding of Black Box Model Explanations at the Subpopulation Level , 2020, ArXiv.

[3]  F. Rossi,et al.  The State of the Art in Enhancing Trust in Machine Learning Models with the Use of Visualizations , 2020, Comput. Graph. Forum.

[4]  Jarke J. van Wijk,et al.  ExplainExplore: Visual Exploration of Machine Learning Explanations , 2020, 2020 IEEE Pacific Visualization Symposium (PacificVis).

[5]  Harmanpreet Kaur,et al.  Interpreting Interpretability: Understanding Data Scientists' Use of Interpretability Tools for Machine Learning , 2020, CHI.

[6]  Mennatallah El-Assady,et al.  explAIner: A Visual Analytics Framework for Interactive and Explainable Machine Learning , 2019, IEEE Transactions on Visualization and Computer Graphics.

[7]  Martin Wattenberg,et al.  The What-If Tool: Interactive Probing of Machine Learning Models , 2019, IEEE Transactions on Visualization and Computer Graphics.

[8]  Duen Horng Chau,et al.  Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations , 2019, IEEE Transactions on Visualization and Computer Graphics.

[9]  Steven M. Drucker,et al.  Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models , 2019, CHI.

[10]  Jarke J. van Wijk,et al.  V‐Awake: A Visual Analytics Approach for Correcting Sleep Predictions from Deep Learning Models , 2019, Comput. Graph. Forum.

[11]  Dik Lun Lee,et al.  iForest: Interpreting Random Forests via Visual Analytics , 2019, IEEE Transactions on Visualization and Computer Graphics.

[12]  Martin Wattenberg,et al.  GAN Lab: Understanding Complex Deep Generative Models using Interactive Visual Experimentation , 2018, IEEE Transactions on Visualization and Computer Graphics.

[13]  Carlos Eduardo Scheidegger,et al.  DimReader: Axis lines that explain non-linear projections , 2017, IEEE Transactions on Visualization and Computer Graphics.

[14]  Huamin Qu,et al.  RuleMatrix: Visualizing and Understanding Classifiers with Rules , 2018, IEEE Transactions on Visualization and Computer Graphics.

[15]  Alexander M. Rush,et al.  Seq2seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models , 2018, IEEE Transactions on Visualization and Computer Graphics.

[16]  Marco Cavallo,et al.  Clustrophile 2: Guided Visual Clustering Analysis , 2018, IEEE Transactions on Visualization and Computer Graphics.

[17]  Tim Miller,et al.  Explanation in Artificial Intelligence: Insights from the Social Sciences , 2017, Artif. Intell..

[18]  Zhimin Li,et al.  NLIZE: A Perturbation-Driven Visual Interrogation Tool for Analyzing and Interpreting Natural Language Inference Models , 2019, IEEE Transactions on Visualization and Computer Graphics.

[19]  Cynthia Rudin,et al.  An Interpretable Model with Globally Consistent Explanations for Credit Risk , 2018, ArXiv.

[20]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection , 2018, J. Open Source Softw..

[21]  Jarke J. van Wijk,et al.  Instance-Level Explanations for Fraud Detection: A Case Study , 2018, ICML 2018.

[22]  Elmar Eisemann,et al.  DeepEyes: Progressive Visual Analytics for Designing Deep Neural Networks , 2018, IEEE Transactions on Visualization and Computer Graphics.

[23]  Xiaoming Liu,et al.  Do Convolutional Neural Networks Learn Class Hierarchy? , 2017, IEEE Transactions on Visualization and Computer Graphics.

[24]  Minsuk Kahng,et al.  ActiVis: Visual Exploration of Industry-Scale Deep Neural Network Models , 2017, IEEE Transactions on Visualization and Computer Graphics.

[25]  Josua Krause,et al.  A User Study on the Effect of Aggregating Explanations for Interpreting Machine Learning Models , 2018 .

[26]  Daniel A. Keim,et al.  What you see is what you can change: Human-centered machine learning by interactive visualization , 2017, Neurocomputing.

[27]  Zhen Li,et al.  Understanding Hidden Memories of Recurrent Neural Networks , 2017, 2017 IEEE Conference on Visual Analytics Science and Technology (VAST).

[28]  Çagatay Demiralp,et al.  Clustrophile: A Tool for Visual Clustering Analysis , 2017, ArXiv.

[29]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[30]  Margo I. Seltzer,et al.  Learning Certifiably Optimal Rule Lists , 2017, KDD.

[31]  Paulo E. Rauber,et al.  Visualizing the Hidden Activity of Artificial Neural Networks , 2017, IEEE Transactions on Visualization and Computer Graphics.

[32]  Enrico Bertini,et al.  Using Visual Analytics to Interpret Predictive Machine Learning Models , 2016, ArXiv.

[33]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[34]  Shie Mannor,et al.  Graying the black box: Understanding DQNs , 2016, ICML.

[35]  Jaak Vilo,et al.  ClustVis: a web tool for visualizing clustering of multivariate data using Principal Component Analysis and heatmap , 2015, Nucleic Acids Res..

[36]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Toon Calders,et al.  Predicting Current User Intent with Contextual Markov Models , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[38]  Tamara Munzner,et al.  A Multi-Level Typology of Abstract Visualization Tasks , 2013, IEEE Transactions on Visualization and Computer Graphics.

[39]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.

[40]  Johannes Gehrke,et al.  Accurate intelligible models with pairwise interactions , 2013, KDD.

[41]  Wen-Ching Lin,et al.  The PMML path towards true interoperability in data mining , 2011, PMML '11.

[42]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[43]  Wei-Yin Loh,et al.  Classification and regression trees , 2011, WIREs Data Mining Knowl. Discov..

[44]  M. Sheelagh T. Carpendale,et al.  Evaluating Information Visualizations , 2008, Information Visualization.

[45]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[46]  Cynthia A. Brewer,et al.  ColorBrewer.org: An Online Tool for Selecting Colour Schemes for Maps , 2003 .

[47]  Ben Shneiderman,et al.  Interactively Exploring Hierarchical Clustering Results , 2002, Computer.

[48]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[49]  V. A. Epanechnikov Non-Parametric Estimation of a Multivariate Probability Density , 1969 .

[50]  D. Shepard A two-dimensional interpolation function for irregularly-spaced data , 1968, ACM National Conference.