A User‐based Visual Analytics Workflow for Exploratory Model Analysis

Many visual analytics systems allow users to interact with machine learning models towards the goals of data exploration and insight generation on a given dataset. However, in some situations, insights may be less important than the production of an accurate predictive model for future use. In that case, users are more interested in generating of diverse and robust predictive models, verifying their performance on holdout data, and selecting the most suitable model for their usage scenario. In this paper, we consider the concept of Exploratory Model Analysis (EMA), which is defined as the process of discovering and selecting relevant models that can be used to make predictions on a data source. We delineate the differences between EMA and the well‐known term exploratory data analysis in terms of the desired outcome of the analytic process: insights into the data or a set of deployable models. The contributions of this work are a visual analytics system workflow for EMA, a user study, and two use cases validating the effectiveness of the workflow. We found that our system workflow enabled users to generate complex models, to assess them for various qualities, and to select the most relevant model for their task.

[1]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[2]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[3]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[4]  Jarke J. van Wijk,et al.  BaobabView: Interactive construction and analysis of decision trees , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[5]  Kalyan Veeramachaneni,et al.  Prediction Factory: automated development and collaborative evaluation of predictive models , 2018, ArXiv.

[6]  Qiang Zhang,et al.  TIARA: a visual exploratory text analytic system , 2010, KDD '10.

[7]  Jarke J. van Wijk,et al.  The value of visualization , 2005, VIS 05. IEEE Visualization, 2005..

[8]  William Ribarsky,et al.  Defining Insight for Visual Analytics , 2009, IEEE Computer Graphics and Applications.

[9]  Valerio Pascucci,et al.  Visual Exploration of High‐Dimensional Data through Subspace Analysis and Dynamic Projections , 2015, Comput. Graph. Forum.

[10]  Jaegul Choo,et al.  iVisClassifier: An interactive visual analytics system for classification based on supervised dimension reduction , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[11]  Arvind Satyanarayan,et al.  Vega-Lite: A Grammar of Interactive Graphics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[12]  Chris Eliasmith,et al.  Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-Learn , 2014, SciPy.

[13]  Alexander M. Rush,et al.  LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks , 2016, IEEE Transactions on Visualization and Computer Graphics.

[14]  Yindalon Aphinyanagphongs,et al.  A Workflow for Visual Diagnostics of Binary Classifiers using Instance-Level Explanations , 2017, 2017 IEEE Conference on Visual Analytics Science and Technology (VAST).

[15]  Klaus Mueller,et al.  TripAdvisor^{N-D}: A Tourism-Inspired High-Dimensional Space Exploration Framework with Overview and Detail , 2013, IEEE Transactions on Visualization and Computer Graphics.

[16]  F. Mosteller,et al.  Understanding robust and exploratory data analysis , 1985 .

[17]  Torsten Möller,et al.  TreePOD: Sensitivity-Aware Selection of Pareto-Optimal Decision Trees , 2018, IEEE Transactions on Visualization and Computer Graphics.

[18]  Kristin A. Cook,et al.  Illuminating the Path: The Research and Development Agenda for Visual Analytics , 2005 .

[19]  J. J. van Wijk The value of visualization , 2005 .

[20]  Russell M. Church,et al.  HOW TO LOOK AT DATA: A REVIEW OF JOHN W. TUKEY'S EXPLORATORY DATA ANALYSIS1 , 1979 .

[21]  John Riedl,et al.  An operator interaction framework for visualization systems , 1998, Proceedings IEEE Symposium on Information Visualization (Cat. No.98TB100258).

[22]  Daniel A. Keim,et al.  Human-centered machine learning through interactive visualization: review and open challenges , 2016, ESANN.

[23]  Harald Piringer,et al.  A Partition-Based Framework for Building and Validating Regression Models , 2013, IEEE Transactions on Visualization and Computer Graphics.

[24]  Chris North,et al.  Semantic Interaction for Sensemaking: Inferring Analytical Reasoning for Model Steering , 2012, IEEE Transactions on Visualization and Computer Graphics.

[25]  Nancy Argüelles,et al.  Author ' s , 2008 .

[26]  Xiaotong Liu,et al.  Multi-Resolution Climate Ensemble Parameter Analysis with Nested Parallel Coordinates Plots , 2017, IEEE Transactions on Visualization and Computer Graphics.

[27]  Kenney Ng,et al.  Clustervision: Visual Supervision of Unsupervised Clustering , 2018, IEEE Transactions on Visualization and Computer Graphics.

[28]  Perry R. Cook,et al.  A Meta-Instrument for Interactive, On-the-Fly Machine Learning , 2009, NIME.

[29]  Sia Siew Kien,et al.  Global IT management: structuring for scale, responsiveness, and innovation , 2010, CACM.

[30]  Wei Chen,et al.  A Survey of Visual Analytic Pipelines , 2016, Journal of Computer Science and Technology.

[31]  Klaus Mueller,et al.  ClusterSculptor: A Visual Analytics Tool for High-Dimensional Data , 2007, 2007 IEEE Symposium on Visual Analytics Science and Technology.

[32]  Daniel A. Keim,et al.  Visual Analytics: Definition, Process, and Challenges , 2008, Information Visualization.

[33]  Carla E. Brodley,et al.  Dis-function: Learning distance functions interactively , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[34]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[35]  Zhen Li,et al.  Towards Better Analysis of Deep Convolutional Neural Networks , 2016, IEEE Transactions on Visualization and Computer Graphics.

[36]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[37]  Min Chen,et al.  What May Visualization Processes Optimize? , 2015, IEEE Transactions on Visualization and Computer Graphics.

[38]  Stefan Bruckner,et al.  Visual Parameter Space Analysis: A Conceptual Framework , 2014, IEEE Transactions on Visualization and Computer Graphics.

[39]  Daniel A. Keim,et al.  Viewing Visual Analytics as Model Building , 2018, Comput. Graph. Forum.

[40]  M. Braga,et al.  Exploratory Data Analysis , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[41]  Alex Endert,et al.  Characterizing Provenance in Visualization and Data Analysis: An Organizational Framework of Provenance Types and Purposes , 2016, IEEE Transactions on Visualization and Computer Graphics.

[42]  Frederick Mosteller,et al.  Understanding robust and exploratory data analysis , 1983 .

[43]  Dylan Cashman,et al.  RNNbow: Visualizing Learning Via Backpropagation Gradients in RNNs , 2018, IEEE Computer Graphics and Applications.

[44]  Daniel A. Keim,et al.  SOMFlow: Guided Exploratory Cluster Analysis with Self-Organizing Maps and Analytic Provenance , 2018, IEEE Transactions on Visualization and Computer Graphics.

[45]  Jeffrey Heer,et al.  A tour through the visualization zoo , 2010, ACM Queue.

[46]  M. Chase,et al.  The role of sports as a social status determinant for children. , 1992, Research quarterly for exercise and sport.

[47]  John W Tukey,et al.  Exploratory Data Analysis: Past, Present and Future , 1993 .

[48]  John T. Stasko,et al.  An interactive visual testbed system for dimension reduction and clustering of large-scale high-dimensional data , 2013, Electronic Imaging.

[49]  Michael Gleicher,et al.  Considerations for Visualizing Comparison , 2018, IEEE Transactions on Visualization and Computer Graphics.

[50]  Thomas Ertl,et al.  Visual Classifier Training for Text Document Retrieval , 2012, IEEE Transactions on Visualization and Computer Graphics.

[51]  HeerJeffrey,et al.  D3 Data-Driven Documents , 2011 .

[52]  Alexander Kumpf,et al.  Visualizing Confidence in Cluster-Based Ensemble Weather Forecast Analyses , 2020, IEEE Transactions on Visualization and Computer Graphics.

[53]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[54]  Chris North,et al.  Toward measuring visualization insight , 2006, IEEE Computer Graphics and Applications.

[55]  Hadley Wickham,et al.  An Implementation of the Grammar of Graphics , 2015 .

[56]  Carlos Eduardo Scheidegger,et al.  An Algebraic Process for Visualization Design , 2014, IEEE Transactions on Visualization and Computer Graphics.

[57]  James A. Landay,et al.  Gestalt: integrated support for implementation and analysis in machine learning , 2010, UIST.

[58]  Răzvan Viorescu 2018 REFORM OF EU DATA PROTECTION RULES , 2017 .

[59]  Daniel A. Keim,et al.  Challenges in Visual Data Analysis , 2006, Tenth International Conference on Information Visualisation (IV'06).

[60]  Qingquan Song,et al.  Efficient Neural Architecture Search with Network Morphism , 2018, ArXiv.

[61]  Daniel A. Keim,et al.  Mastering the Information Age - Solving Problems with Visual Analytics , 2010 .

[62]  Yang Wang,et al.  Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models , 2018, IEEE Transactions on Visualization and Computer Graphics.

[63]  Daniel A. Keim,et al.  Knowledge Generation Model for Visual Analytics , 2014, IEEE Transactions on Visualization and Computer Graphics.

[64]  Michael Gleicher,et al.  Explainers: Expert Explorations with Crafted Projections , 2013, IEEE Transactions on Visualization and Computer Graphics.

[65]  Leland Wilkinson,et al.  Visual pattern discovery using random projections , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[66]  Ben Shneiderman,et al.  Integrating statistics and visualization: case studies of gaining clarity during exploratory data analysis , 2008, CHI.

[67]  Jun Zhu,et al.  Analyzing the Training Processes of Deep Generative Models , 2018, IEEE Transactions on Visualization and Computer Graphics.

[68]  David Maxwell Chickering,et al.  ModelTracker: Redesigning Performance Analysis Tools for Machine Learning , 2015, CHI.

[69]  Florian Heimerl,et al.  Interactive Analysis of Word Vector Embeddings , 2018, Comput. Graph. Forum.

[70]  Xiting Wang,et al.  Towards better analysis of machine learning models: A visual analytics perspective , 2017, Vis. Informatics.

[71]  Ben Shneiderman,et al.  Readings in information visualization - using vision to think , 1999 .

[72]  François Laviolette,et al.  Domain-Adversarial Training of Neural Networks , 2015, J. Mach. Learn. Res..

[73]  Jason Dykes,et al.  Human-Centered Approaches in Geovisualization Design: Investigating Multiple Methods Through a Long-Term Case Study , 2011, IEEE Transactions on Visualization and Computer Graphics.

[74]  Desney S. Tan,et al.  Interactive optimization for steering machine classification , 2010, CHI.

[75]  Haim Levkowitz,et al.  From Visual Data Exploration to Visual Data Mining: A Survey , 2003, IEEE Trans. Vis. Comput. Graph..

[76]  Silvia Miksch,et al.  Visual Methods for Analyzing Probabilistic Classification Data , 2014, IEEE Transactions on Visualization and Computer Graphics.

[77]  Bongshin Lee,et al.  Squares: Supporting Interactive Performance Analysis for Multiclass Classifiers , 2017, IEEE Transactions on Visualization and Computer Graphics.

[78]  Min Chen,et al.  VIS4ML: An Ontology for Visual Analytics Assisted Machine Learning , 2019, IEEE Transactions on Visualization and Computer Graphics.

[79]  Tamara Munzner,et al.  Overview: The Design, Adoption, and Analysis of a Visual Document Mining Tool for Investigative Journalists , 2014, IEEE Transactions on Visualization and Computer Graphics.

[80]  David D. Cox,et al.  Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[81]  Alex Endert,et al.  BEAMES: Interactive Multimodel Steering, Selection, and Inspection for Regression Tasks , 2019, IEEE Computer Graphics and Applications.

[82]  Jean-Daniel Fekete The InfoVis Toolkit , 2004 .

[83]  Lars Kotthoff,et al.  Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA , 2017, J. Mach. Learn. Res..

[84]  Ganesh S. Oak Information Visualization Introduction , 2022 .

[85]  Alexander M. Rush,et al.  Seq2seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models , 2018, IEEE Transactions on Visualization and Computer Graphics.

[86]  Jing Wu,et al.  Visual Diagnosis of Tree Boosting Methods , 2018, IEEE Transactions on Visualization and Computer Graphics.

[87]  Marco Cavallo,et al.  Clustrophile 2: Guided Visual Clustering Analysis , 2018, IEEE Transactions on Visualization and Computer Graphics.

[88]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[89]  William Ribarsky,et al.  iPCA: An Interactive System for PCA‐based Visual Analytics , 2009, Comput. Graph. Forum.

[90]  Xiaoming Liu,et al.  Do Convolutional Neural Networks Learn Class Hierarchy? , 2017, IEEE Transactions on Visualization and Computer Graphics.

[91]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .