SCRAM: Simple Checks for Realtime Analysis of Model Training for Non-Expert ML Programmers

Many non-expert Machine Learning users wish to apply powerful deep learning models to their own domains but encounter hurdles in the opaque model tuning process. We introduce SCRAM, a tool which uses heuristics to detect potential error conditions in model output and suggests actionable steps and best practices to help such users tune their models. Inspired by metaphors from software engineering, SCRAM extends high-level deep learning development tools to interpret model metrics during training and produce human-readable error messages. We validate SCRAM through three author-created example scenarios with image and text datasets, and by collecting informal feedback from ML researchers with teaching experience. We finally reflect upon our feedback for the design of future ML debugging tools.

[1]  Björn Hartmann,et al.  Machine Learning for Makers: Interactive Sensor Data Classification Based on Augmented Code Examples , 2017, Conference on Designing Interactive Systems.

[2]  Tovi Grossman,et al.  A survey of software learnability: metrics, methodologies and guidelines , 2009, CHI.

[3]  John Domingue,et al.  Software visualization : programming as a multimedia experience , 1998 .

[4]  Harald C. Gall,et al.  Software Engineering for Machine Learning: A Case Study , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP).

[5]  Brad A. Myers,et al.  Finding causes of program output with the Java Whyline , 2009, CHI.

[6]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[7]  Philip J. Guo,et al.  Software Developers Learning Machine Learning: Motivations, Hurdles, and Desires , 2019, 2019 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[8]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[9]  James A. Landay,et al.  Investigating statistical machine learning as a tool for software development , 2008, CHI.

[10]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[11]  S. Diehl,et al.  Software visualization , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[12]  Steven M. Drucker,et al.  A system for real-time interactive analysis of deep learning training , 2019, EICS.

[13]  Deborah Silver,et al.  Feature Visualization , 1994, Scientific Visualization.

[14]  Sana Malik,et al.  DeepCompare: Visual and Interactive Comparison of Deep Learning Model Performance , 2019, IEEE Computer Graphics and Applications.

[15]  Björn Hartmann,et al.  Bifröst: Visualizing and Checking Behavior of Embedded Systems across Hardware and Software , 2017, UIST.

[16]  Murray Hill,et al.  Lint, a C Program Checker , 1978 .

[17]  J. Sensmeier Harnessing the power of artificial intelligence. , 2017, Nursing management.

[18]  Barbara Plank,et al.  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies , 2011 .

[19]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[20]  Jerry Alan Fails,et al.  Interactive machine learning , 2003, IUI '03.

[21]  Lalana Kagal,et al.  Explaining Explanations: An Overview of Interpretability of Machine Learning , 2018, 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA).

[22]  Rachel K. E. Bellamy,et al.  Trials and tribulations of developers of intelligent systems: A field study , 2016, 2016 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[23]  James A. Landay,et al.  Gestalt: integrated support for implementation and analysis in machine learning , 2010, UIST.

[24]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[25]  Ian Goodfellow,et al.  TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing , 2018, ICML.

[26]  Björn Hartmann,et al.  The Toastboard: Ubiquitous Instrumentation and Automated Checking of Breadboarded Circuits , 2016, UIST.

[27]  Martin Wattenberg,et al.  Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV) , 2017, ICML.

[28]  David Maxwell Chickering,et al.  ModelTracker: Redesigning Performance Analysis Tools for Machine Learning , 2015, CHI.

[29]  Yang Wang,et al.  Manifold: A Model-Agnostic Visual Debugging Tool for Machine Learning at Uber , 2019 .

[30]  Glenford J. Myers,et al.  Art of Software Testing , 1979 .

[31]  Björn Hartmann,et al.  Tutorons: Generating context-relevant, on-demand explanations and demonstrations of online code , 2015, 2015 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).