A User‐based Visual Analytics Workflow for Exploratory Model Analysis

A recent advancement in the machine learning community is the development of automated machine learning (autoML) systems, such as autoWeka or Google's Cloud AutoML, which automate the model selection and tuning process. However, while autoML tools give users access to arbitrarily complex models, they typically return those models with little context or explanation. Visual analytics can be helpful in giving a user of autoML insight into their data, and a more complete understanding of the models discovered by autoML, including differences between multiple models. In this work, we describe how visual analytics for automated model discovery differs from traditional visual analytics for machine learning. First, we propose an architecture based on an extension of existing visual analytics frameworks. Then we describe a prototype system Snowcat, developed according to the presented framework and architecture, that aids users in generating models for a diverse set of data and modeling tasks.

[1]  Sergio Escalera,et al.  Design of the 2015 ChaLearn AutoML challenge , 2015, IJCNN.

[2]  Klaus Mueller,et al.  ClusterSculptor: A Visual Analytics Tool for High-Dimensional Data , 2007, 2007 IEEE Symposium on Visual Analytics Science and Technology.

[3]  William Ribarsky,et al.  iPCA: An Interactive System for PCA‐based Visual Analytics , 2009, Comput. Graph. Forum.

[4]  Aditya G. Parameswaran,et al.  DataHub: Collaborative Data Science & Dataset Version Management at Scale , 2014, CIDR.

[5]  Daniel A. Keim,et al.  Human-centered machine learning through interactive visualization: review and open challenges , 2016, ESANN.

[6]  Kevin Leyton-Brown,et al.  Auto-WEKA: combined selection and hyperparameter optimization of classification algorithms , 2012, KDD.

[7]  Carla E. Brodley,et al.  Dis-function: Learning distance functions interactively , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[8]  Luc De Raedt,et al.  Towards Automated Relational Data Wrangling , 2017, AutoML@PKDD/ECML.

[9]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[10]  Stefan Bruckner,et al.  Visual Parameter Space Analysis: A Conceptual Framework , 2014, IEEE Transactions on Visualization and Computer Graphics.

[11]  Michael Stonebraker,et al.  Dynamic reduction of query result sets for interactive visualizaton , 2013, 2013 IEEE International Conference on Big Data.

[12]  David C. Hoaglin,et al.  John W. Tukey and data analysis , 2003 .

[13]  Zhen Li,et al.  Towards Better Analysis of Deep Convolutional Neural Networks , 2016, IEEE Transactions on Visualization and Computer Graphics.

[14]  Jeffrey Heer,et al.  A tour through the visualization zoo , 2010, Commun. ACM.

[15]  Arvind Satyanarayan,et al.  Vega-Lite: A Grammar of Interactive Graphics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[16]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[17]  Ben Shneiderman,et al.  Integrating statistics and visualization: case studies of gaining clarity during exploratory data analysis , 2008, CHI.

[18]  Dylan Cashman,et al.  RNNbow: Visualizing Learning Via Backpropagation Gradients in RNNs , 2018, IEEE Computer Graphics and Applications.

[19]  Jean-Daniel Fekete The InfoVis Toolkit , 2004 .

[20]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[21]  Harald Piringer,et al.  A Partition-Based Framework for Building and Validating Regression Models , 2013, IEEE Transactions on Visualization and Computer Graphics.

[22]  Chris North,et al.  Semantic Interaction for Sensemaking: Inferring Analytical Reasoning for Model Steering , 2012, IEEE Transactions on Visualization and Computer Graphics.

[23]  Lars Kotthoff,et al.  Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA , 2017, J. Mach. Learn. Res..

[24]  Martin Wattenberg,et al.  Visualizing Dataflow Graphs of Deep Learning Models in TensorFlow , 2018, IEEE Transactions on Visualization and Computer Graphics.

[25]  John D. Hunter,et al.  Matplotlib: A 2D Graphics Environment , 2007, Computing in Science & Engineering.

[26]  Min Chen,et al.  What May Visualization Processes Optimize? , 2015, IEEE Transactions on Visualization and Computer Graphics.

[27]  Alex Endert,et al.  Characterizing Provenance in Visualization and Data Analysis: An Organizational Framework of Provenance Types and Purposes , 2016, IEEE Transactions on Visualization and Computer Graphics.

[28]  Jaegul Choo,et al.  UTOPIAN: User-Driven Topic Modeling Based on Interactive Nonnegative Matrix Factorization , 2013, IEEE Transactions on Visualization and Computer Graphics.

[29]  Kenney Ng,et al.  Clustervision: Visual Supervision of Unsupervised Clustering , 2018, IEEE Transactions on Visualization and Computer Graphics.

[30]  Enrico Bertini,et al.  Interpreting Black-Box Classifiers Using Instance-Level Visual Explanations , 2017, HILDA@SIGMOD.

[31]  Ganesh S. Oak Information Visualization Introduction , 2022 .

[32]  Alexander M. Rush,et al.  Seq2seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models , 2018, IEEE Transactions on Visualization and Computer Graphics.

[33]  Haim Levkowitz,et al.  From Visual Data Exploration to Visual Data Mining: A Survey , 2003, IEEE Trans. Vis. Comput. Graph..

[34]  Jing Wu,et al.  Visual Diagnosis of Tree Boosting Methods , 2018, IEEE Transactions on Visualization and Computer Graphics.

[35]  Marco Cavallo,et al.  Clustrophile 2: Guided Visual Clustering Analysis , 2018, IEEE Transactions on Visualization and Computer Graphics.

[36]  Zachary Chase Lipton The mythos of model interpretability , 2016, ACM Queue.

[37]  Bongshin Lee,et al.  Squares: Supporting Interactive Performance Analysis for Multiclass Classifiers , 2017, IEEE Transactions on Visualization and Computer Graphics.

[38]  Jeffrey Heer,et al.  SpanningAspectRatioBank Easing FunctionS ArrayIn ColorIn Date Interpolator MatrixInterpola NumObjecPointI Rectang ISchedu Parallel Pause Scheduler Sequen Transition Transitioner Transiti Tween Co DelimGraphMLCon IData JSONCon DataField DataSc Dat DataSource Data DataUtil DirtySprite LineS RectSprite , 2011 .

[39]  Michael Gleicher,et al.  Explainers: Expert Explorations with Crafted Projections , 2013, IEEE Transactions on Visualization and Computer Graphics.

[40]  Kalyan Veeramachaneni,et al.  Prediction Factory: automated development and collaborative evaluation of predictive models , 2018, ArXiv.

[41]  Jarke J. van Wijk,et al.  The value of visualization , 2005, VIS 05. IEEE Visualization, 2005..

[42]  Alex Endert,et al.  Podium: Ranking Data Using Mixed-Initiative Visual Analytics , 2018, IEEE Transactions on Visualization and Computer Graphics.

[43]  Alexander M. Rush,et al.  LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks , 2016, IEEE Transactions on Visualization and Computer Graphics.

[44]  Daniel A. Keim,et al.  Challenges in Visual Data Analysis , 2006, Tenth International Conference on Information Visualisation (IV'06).

[45]  Torsten Möller,et al.  TreePOD: Sensitivity-Aware Selection of Pareto-Optimal Decision Trees , 2018, IEEE Transactions on Visualization and Computer Graphics.

[46]  Kristin A. Cook,et al.  Illuminating the Path: The Research and Development Agenda for Visual Analytics , 2005 .

[47]  John T. Stasko,et al.  An interactive visual testbed system for dimension reduction and clustering of large-scale high-dimensional data , 2013, Electronic Imaging.

[48]  Michael Gleicher,et al.  Considerations for Visualizing Comparison , 2018, IEEE Transactions on Visualization and Computer Graphics.

[49]  Thomas Ertl,et al.  Visual Classifier Training for Text Document Retrieval , 2012, IEEE Transactions on Visualization and Computer Graphics.

[50]  Leland Wilkinson,et al.  Visual pattern discovery using random projections , 2012, 2012 IEEE Conference on Visual Analytics Science and Technology (VAST).

[51]  Spyros Makridakis,et al.  The M3-Competition: results, conclusions and implications , 2000 .

[52]  David Maxwell Chickering,et al.  ModelTracker: Redesigning Performance Analysis Tools for Machine Learning , 2015, CHI.

[53]  Florian Heimerl,et al.  Interactive Analysis of Word Vector Embeddings , 2018, Comput. Graph. Forum.

[54]  Xiting Wang,et al.  Towards better analysis of machine learning models: A visual analytics perspective , 2017, Vis. Informatics.

[55]  Ben Shneiderman,et al.  Readings in information visualization - using vision to think , 1999 .

[56]  Jason Dykes,et al.  Human-Centered Approaches in Geovisualization Design: Investigating Multiple Methods Through a Long-Term Case Study , 2011, IEEE Transactions on Visualization and Computer Graphics.

[57]  Dawn Xiaodong Song,et al.  ExploreKit: Automatic Feature Generation and Selection , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[58]  Xiaotong Liu,et al.  Multi-Resolution Climate Ensemble Parameter Analysis with Nested Parallel Coordinates Plots , 2017, IEEE Transactions on Visualization and Computer Graphics.

[59]  Qingquan Song,et al.  Auto-Keras: An Efficient Neural Architecture Search System , 2018, KDD.

[60]  Daniel A. Keim,et al.  Mastering the Information Age - Solving Problems with Visual Analytics , 2010 .

[61]  Chris Eliasmith,et al.  Hyperopt-Sklearn: Automatic Hyperparameter Configuration for Scikit-Learn , 2014, SciPy.

[62]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[63]  John W. Tukey,et al.  Exploratory Data Analysis. , 1979 .

[64]  Luís Torgo,et al.  OpenML: networked science in machine learning , 2014, SKDD.

[65]  Carlos Eduardo Scheidegger,et al.  An Algebraic Process for Visualization Design , 2014, IEEE Transactions on Visualization and Computer Graphics.

[66]  James A. Landay,et al.  Gestalt: integrated support for implementation and analysis in machine learning , 2010, UIST.

[67]  Ben Shneiderman,et al.  The eyes have it: a task by data type taxonomy for information visualizations , 1996, Proceedings 1996 IEEE Symposium on Visual Languages.

[68]  Cláudio T. Silva,et al.  VisTrails: visualization meets data management , 2006, SIGMOD Conference.

[69]  Hod Lipson,et al.  Understanding Neural Networks Through Deep Visualization , 2015, ArXiv.

[70]  Jarke J. van Wijk,et al.  BaobabView: Interactive construction and analysis of decision trees , 2011, 2011 IEEE Conference on Visual Analytics Science and Technology (VAST).

[71]  Xiaoming Liu,et al.  Do Convolutional Neural Networks Learn Class Hierarchy? , 2017, IEEE Transactions on Visualization and Computer Graphics.

[72]  Peter J. Haas,et al.  Interactive data Analysis: The Control Project , 1999, Computer.

[73]  Wei Chen,et al.  A Survey of Visual Analytic Pipelines , 2016, Journal of Computer Science and Technology.

[74]  Chris North,et al.  Toward measuring visualization insight , 2006, IEEE Computer Graphics and Applications.

[75]  Daniel A. Keim,et al.  SOMFlow: Guided Exploratory Cluster Analysis with Self-Organizing Maps and Analytic Provenance , 2018, IEEE Transactions on Visualization and Computer Graphics.

[76]  John W Tukey,et al.  Exploratory Data Analysis: Past, Present and Future , 1993 .

[77]  Arun Ross,et al.  ATM: A distributed, collaborative, scalable system for automated machine learning , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[78]  Daniel A. Keim,et al.  Knowledge Generation Model for Visual Analytics , 2014, IEEE Transactions on Visualization and Computer Graphics.

[79]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[80]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[81]  Qiang Zhang,et al.  TIARA: a visual exploratory text analytic system , 2010, KDD '10.

[82]  Valerio Pascucci,et al.  Visual Exploration of High‐Dimensional Data through Subspace Analysis and Dynamic Projections , 2015, Comput. Graph. Forum.

[83]  Min Chen,et al.  VIS4ML: An Ontology for Visual Analytics Assisted Machine Learning , 2019, IEEE Transactions on Visualization and Computer Graphics.

[84]  Tamara Munzner,et al.  Overview: The Design, Adoption, and Analysis of a Visual Document Mining Tool for Investigative Journalists , 2014, IEEE Transactions on Visualization and Computer Graphics.

[85]  David D. Cox,et al.  Hyperopt: A Python Library for Optimizing the Hyperparameters of Machine Learning Algorithms , 2013, SciPy.

[86]  Jun Zhu,et al.  Analyzing the Training Processes of Deep Generative Models , 2018, IEEE Transactions on Visualization and Computer Graphics.

[87]  John Riedl,et al.  An operator interaction framework for visualization systems , 1998, Proceedings IEEE Symposium on Information Visualization (Cat. No.98TB100258).

[88]  Daniel A. Keim,et al.  Viewing Visual Analytics as Model Building , 2018, Comput. Graph. Forum.