Ariadne: analysis for machine learning programs

Machine learning has transformed domains like vision and translation, and is now increasingly used in science, where the correctness of such code is vital. Python is popular for machine learning, in part because of its wealth of machine learning libraries, and is felt to make development faster; however, this dynamic language has less support for error detection at code creation time than tools like Eclipse. This is especially problematic for machine learning: given its statistical nature, code with subtle errors may run and produce results that look plausible but are meaningless. This can vitiate scientific results. We report on : applying a static framework, WALA, to machine learning code that uses TensorFlow. We have created static analysis for Python, a type system for tracking tensors—Tensorflow’s core data structures—and a data flow analysis to track their usage. We report on how it was built and present some early results.

[1]  Shay Artzi,et al.  F4F: taint analysis of framework-based web applications , 2011, OOPSLA '11.

[2]  M. Lindquist,et al.  An fMRI-based neurologic signature of physical pain. , 2013, The New England journal of medicine.

[3]  Simon J. D. Prince,et al.  Computer Vision: Models, Learning, and Inference , 2012 .

[4]  Laurie Hendren,et al.  Soot: a Java bytecode optimization framework , 2010, CASCON.

[5]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[6]  Julian Dolby,et al.  Practically Tunable Static Analysis Framework for Large-Scale JavaScript Applications (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[7]  Alejandro Russo,et al.  A Taint Mode for Python via a Library , 2010, NordSec.

[8]  Martin Fowler,et al.  Refactoring - Improving the Design of Existing Code , 1999, Addison Wesley object technology series.

[9]  R. Poldrack The role of fMRI in Cognitive Neuroscience: where do we stand? , 2008, Current Opinion in Neurobiology.

[10]  Sukyoung Ryu,et al.  SAFE: Formal Specification and Implementation of a Scalable Analysis Framework for ECMAScript , 2012 .

[11]  Yannis Smaragdakis,et al.  Strictly declarative specification of sophisticated points-to analyses , 2009, OOPSLA.

[12]  Tongfei Chen Typesafe abstractions for tensor operations (short paper) , 2017, SCALA@SPLASH.

[13]  Frank Tip,et al.  Efficient construction of approximate call graphs for JavaScript IDE services , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[14]  Tongfei Chen,et al.  Typesafe Abstractions for Tensor Operations , 2017, ArXiv.

[15]  Julian Dolby,et al.  HybriDroid: Static analysis framework for Android hybrid applications , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[16]  Simon J. D. Prince,et al.  Computer Vision: Index , 2012 .

[17]  Marco Pistoia,et al.  Saving the world wide web from vulnerable JavaScript , 2011, ISSTA '11.

[18]  Tyrone D. Cannon,et al.  Elucidating a Magnetic Resonance Imaging-Based Neuroanatomic Biomarker for Psychosis: Classification Analysis Using Probabilistic Brain Atlas and Machine Learning Algorithms , 2009, Biological Psychiatry.

[19]  Julian Dolby,et al.  Semi-Automatic J2EE Transaction Configuration , 2004 .

[20]  Oliver Y. Chén,et al.  The human cortex possesses a reconfigurable dynamic network architecture that is disrupted in psychosis , 2018, Nature Communications.

[21]  Frank Tip,et al.  Correlation Tracking for Points-To Analysis of JavaScript , 2012, ECOOP.

[22]  Premkumar T. Devanbu,et al.  A Survey of Machine Learning for Big Code and Naturalness , 2017, ACM Comput. Surv..

[23]  Mauricio A. Saca Refactoring improving the design of existing code , 2017, 2017 IEEE 37th Central America and Panama Convention (CONCAPAN XXXVII).

[24]  Martin Wattenberg,et al.  Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation , 2016, TACL.