COSET: A Benchmark for Evaluating Neural Program Embeddings

Neural program embedding can be helpful in analyzing large software, a task that is challenging for traditional logic-based program analyses due to their limited scalability. A key focus of recent machine-learning advances in this area is on modeling program semantics instead of just syntax. Unfortunately evaluating such advances is not obvious, as program semantics does not lend itself to straightforward metrics. In this paper, we introduce a benchmarking framework called COSET for standardizing the evaluation of neural program embeddings. COSET consists of a diverse dataset of programs in source-code format, labeled by human experts according to a number of program properties of interest. A point of novelty is a suite of program transformations included in COSET. These transformations when applied to the base dataset can simulate natural changes to program code due to optimization and refactoring and can serve as a "debugging" tool for classification mistakes. We conducted a pilot study on four prominent models: TreeLSTM, gated graph neural network (GGNN), AST-Path neural network (APNN), and DYPRO. We found that COSET is useful in identifying the strengths and limitations of each model and in pinpointing specific syntactic and semantic characteristics of programs that pose challenges.

[1]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[2]  Tao Wang,et al.  Convolutional Neural Networks over Tree Structures for Programming Language Processing , 2014, AAAI.

[3]  Ke Wang,et al.  Learning Scalable and Precise Representation of Program Semantics , 2019, ArXiv.

[4]  G. Goth Beware the March of this IDE: Eclipse is overshadowing other tool technologies , 2005, IEEE Software.

[5]  Armando Solar-Lezama,et al.  sk_p: a neural program corrector for MOOCs , 2016, SPLASH.

[6]  Marc Brockschmidt,et al.  Learning to Represent Programs with Graphs , 2017, ICLR.

[7]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[8]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[9]  Rahul Gupta,et al.  DeepFix: Fixing Common C Language Errors by Deep Learning , 2017, AAAI.

[10]  Martin T. Vechev,et al.  PHOG: Probabilistic Model for Code , 2016, ICML.

[11]  Daniel Tarlow,et al.  Structured Generative Models of Natural Source Code , 2014, ICML.

[12]  Rishabh Singh,et al.  Search, align, and repair: data-driven feedback generation for introductory programming exercises , 2017, PLDI.

[13]  Aditya V. Thakur,et al.  Path-based function embeddings , 2018, ICSE.

[14]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools (2nd Edition) , 2006 .

[16]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[17]  Uri Alon,et al.  code2vec: learning distributed representations of code , 2018, Proc. ACM Program. Lang..

[18]  Zhendong Su,et al.  On the naturalness of software , 2012, ICSE 2012.

[19]  Shuvendu K. Lahiri,et al.  Code vectors: understanding programs through embedded abstracted symbolic traces , 2018, ESEC/SIGSOFT FSE.

[20]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.