Model LineUpper: Supporting Interactive Model Comparison at Multiple Levels for AutoML

Automated Machine Learning (AutoML) is a rapidly growing set of technologies that automate the model development pipeline by searching model space and generating candidate models. A critical, final step of AutoML is human selection of a final model from dozens of candidates. In current AutoML systems, selection is supported only by performance metrics. Prior work has shown that in practice, people evaluate ML models based on additional criteria, such as the way a model makes predictions. Comparison may happen at multiple levels, from types of errors, to feature importance, to how the model makes predictions of specific instances. We developed Model LineUpper to support interactive model comparison for AutoML by integrating multiple Explainable AI (XAI) and visualization techniques. We conducted a user study in which we both evaluated the system and used it as a technology probe to understand how users perform model comparison in an AutoML system. We discuss design implications for utilizing XAI techniques for model comparison and supporting the unique needs of data scientists in comparing AutoML models.

[1]  Soya Park,et al.  How AI Developers Overcome Communication Challenges in a Multidisciplinary Team , 2021, Proc. ACM Hum. Comput. Interact..

[2]  Michael J. Muller,et al.  How Data Science Workers Work with Data: Discovery, Capture, Curation, Design, Creation , 2019, CHI.

[3]  Erick Oduor,et al.  AutoDS: Towards Human-Centered Automation of Data Science , 2021, CHI.

[4]  Erik Strumbelj,et al.  Explaining prediction models and individual predictions with feature contributions , 2014, Knowledge and Information Systems.

[5]  Jaegul Choo,et al.  iVisClassifier: An interactive visual analytics system for classification based on supervised dimension reduction , 2010, 2010 IEEE Symposium on Visual Analytics Science and Technology.

[6]  Soya Park,et al.  How Much Automation Does a Data Scientist Want? , 2021, ArXiv.

[7]  Parikshit Ram,et al.  Human-AI Collaboration in Data Science , 2019, Proc. ACM Hum. Comput. Interact..

[8]  Bongshin Lee,et al.  Squares: Supporting Interactive Performance Analysis for Multiclass Classifiers , 2017, IEEE Transactions on Visualization and Computer Graphics.

[9]  Kenney Ng,et al.  Interacting with Predictions: Visual Inspection of Black-box Machine Learning Models , 2016, CHI.

[10]  Yong Wang,et al.  DFSeer: A Visual Analytics Approach to Facilitate Model Selection for Demand Forecasting , 2020, CHI.

[11]  James T. Miller,et al.  An Empirical Evaluation of the System Usability Scale , 2008, Int. J. Hum. Comput. Interact..

[12]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[13]  Kalyan Veeramachaneni,et al.  ATMSeer: Increasing Transparency and Controllability in Automated Machine Learning , 2019, CHI.

[14]  A. Azzouz 2011 , 2020, City.

[15]  Yang Wang,et al.  Manifold: A Model-Agnostic Framework for Interpretation and Diagnosis of Machine Learning Models , 2018, IEEE Transactions on Visualization and Computer Graphics.

[16]  Parikshit Ram,et al.  AutoAI: Automating the End-to-End AI Lifecycle with Humans-in-the-Loop , 2020, IUI Companion.

[17]  Aditya G. Parameswaran,et al.  A Human-in-the-loop Perspective on AutoML: Milestones and the Road Ahead , 2019, IEEE Data Eng. Bull..

[18]  David Maxwell Chickering,et al.  ModelTracker: Redesigning Performance Analysis Tools for Machine Learning , 2015, CHI.

[19]  Silvia Miksch,et al.  Visual Methods for Analyzing Probabilistic Classification Data , 2014, IEEE Transactions on Visualization and Computer Graphics.

[20]  Alexander M. Rush,et al.  LSTMVis: A Tool for Visual Analysis of Hidden State Dynamics in Recurrent Neural Networks , 2016, IEEE Transactions on Visualization and Computer Graphics.

[21]  Kush R. Varshney,et al.  How Data ScientistsWork Together With Domain Experts in Scientific Collaborations , 2019, Proc. ACM Hum. Comput. Interact..

[22]  Justin D. Weisz,et al.  AutoAIViz: opening the blackbox of automated artificial intelligence with conditional parallel coordinates , 2020, IUI.

[23]  Parikshit Ram,et al.  An ADMM Based Framework for AutoML Pipeline Configuration , 2020, AAAI.

[24]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[25]  Carlos Guestrin,et al.  "Why Should I Trust You?": Explaining the Predictions of Any Classifier , 2016, ArXiv.

[26]  Ashish Kapoor,et al.  FeatureInsight: Visual support for error-driven feature ideation in text classification , 2015, 2015 IEEE Conference on Visual Analytics Science and Technology (VAST).

[27]  Steven M. Drucker,et al.  Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models , 2019, CHI.

[28]  Minsuk Kahng,et al.  ActiVis: Visual Exploration of Industry-Scale Deep Neural Network Models , 2017, IEEE Transactions on Visualization and Computer Graphics.