SynEva: Evaluating ML Programs by Mirror Program Synthesis

Machine learning (ML) programs are being widely used in various human-related applications. However, their testing always remains to be a challenging problem, and one can hardly decide whether and how the existing knowledge extracted from training scenarios suit new scenarios. Existing approaches typically have restricted usages due to their assumptions on the availability of an oracle, comparable implementation, or manual inspection efforts. We solve this problem by proposing a novel program synthesis based approach, SynEva, that can systematically construct an oracle-alike mirror program for similarity measurement, and automatically compare it with the existing knowledge on new scenarios to decide how the knowledge suits the new scenarios. SynEva is lightweight and fully automated. Our experimental evaluation with real-world data sets validates SynEva's effectiveness by strong correlation and little overhead results. We expect that SynEva can apply to, and help evaluate, more ML programs for new scenarios.

[1]  Cliff Lampe,et al.  The Benefits of Facebook "Friends: " Social Capital and College Students' Use of Online Social Network Sites , 2007, J. Comput. Mediat. Commun..

[2]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[3]  Rance Cleaveland,et al.  Implementing mathematics with the Nuprl proof development system , 1986 .

[4]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[5]  Zohar Manna,et al.  A Deductive Approach to Program Synthesis , 1979, TOPL.

[6]  Nikolai Tillmann,et al.  DySy: dynamic symbolic execution for invariant inference , 2008, ICSE.

[7]  Zhendong Su,et al.  Coverage-directed differential testing of JVM implementations , 2016, PLDI.

[8]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[9]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[10]  Sumit Gulwani,et al.  Path-based inductive synthesis for program inversion , 2011, PLDI '11.

[11]  David Wetherall,et al.  Privacy oracle: a system for finding application leaks with black box differential testing , 2008, CCS.

[12]  Armando Solar-Lezama,et al.  Program synthesis by sketching , 2008 .

[13]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[14]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[15]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[16]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[17]  Neil Immerman,et al.  A simple inductive synthesis methodology and its applications , 2010, OOPSLA.

[18]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[19]  Michael D. Ernst,et al.  Randoop: feedback-directed random testing for Java , 2007, OOPSLA '07.

[20]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[21]  Gordon Fraser,et al.  EvoSuite: automatic test suite generation for object-oriented software , 2011, ESEC/FSE '11.

[22]  Amir Pnueli,et al.  On the Synthesis of an Asynchronous Reactive Module , 1989, ICALP.

[23]  Michael D. Ernst,et al.  Eclat: Automatic Generation and Classification of Test Inputs , 2005, ECOOP.

[24]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[25]  M. Levandowsky,et al.  Distance between Sets , 1971, Nature.

[26]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[27]  Salvatore J. Stolfo,et al.  NEZHA: Efficient Domain-Independent Differential Testing , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[28]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[30]  Sumit Gulwani,et al.  Spreadsheet table transformations from examples , 2011, PLDI '11.

[31]  Sumit Gulwani,et al.  Automating string processing in spreadsheets using input-output examples , 2011, POPL '11.

[32]  Shing-Chi Cheung,et al.  Poster: Synthesizing Relation-Aware Entity Transformation by Examples , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[33]  Yu He,et al.  The YouTube video recommendation system , 2010, RecSys '10.

[34]  Jason Yosinski,et al.  Deep neural networks are easily fooled: High confidence predictions for unrecognizable images , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Rajeev Alur,et al.  Syntax-guided synthesis , 2013, 2013 Formal Methods in Computer-Aided Design.

[36]  Douglas R. Smith,et al.  KIDS: A Semiautomatic Program Development System , 1990, IEEE Trans. Software Eng..

[37]  Barbara J. Grosz,et al.  Natural-Language Processing , 1982, Artificial Intelligence.

[38]  Suman Jana,et al.  DeepTest: Automated Testing of Deep-Neural-Network-Driven Autonomous Cars , 2017, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[39]  Junfeng Yang,et al.  DeepXplore , 2019, Commun. ACM.