End-to-End Prediction of Buffer Overruns from Raw Source Code via Neural Memory Networks

Detecting buffer overruns from a source code is one of the most common and yet challenging tasks in program analysis. Current approaches based on rigid rules and handcrafted features are limited in terms of flexible applicability and robustness due to diverse bug patterns and characteristics existing in sophisticated real-world software programs. In this paper, we propose a novel, data-driven approach that is completely end-to-end without requiring any hand-crafted features, thus free from any program language-specific structural limitations. In particular, our approach leverages a recently proposed neural network model called memory networks that have shown the state-of-the-art performances mainly in question-answering tasks. Our experimental results using source code samples demonstrate that our proposed model is capable of accurately detecting different types of buffer overruns. We also present in-depth analyses on how a memory network can learn to understand the semantics in programming languages solely from raw source codes, such as tracing variables of interest, identifying numerical values, and performing their quantitative comparisons.

[1]  Luciano Baresi,et al.  Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering , 2013, FSE 2013.

[2]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[3]  Armando Solar-Lezama,et al.  sk_p: a neural program corrector for MOOCs , 2016, SPLASH.

[4]  Zhi-Hua Zhou,et al.  Learning Unified Features from Natural and Programming Languages for Locating Buggy Source Code , 2016, IJCAI.

[5]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[6]  Andreas Zeller,et al.  Proceedings of the 2008 international symposium on Software testing and analysis , 2008, ISSTA 2008.

[7]  Edward McCrorie,et al.  Black , 2011 .

[8]  Paul E. Black,et al.  Juliet 1.1 C/C++ and Java Test Suite , 2012, Computer.

[9]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[10]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[11]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[12]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[13]  Alexander Aiken,et al.  A Data Driven Approach for Algebraic Loop Invariants , 2013, ESOP.

[14]  Walter Daelemans,et al.  Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2014, EMNLP 2014.

[15]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[16]  Rishabh Singh,et al.  Automated Correction for Syntax Errors in Programming Assignments using Recurrent Neural Networks , 2016, ArXiv.

[17]  Alexander Aiken,et al.  Interpolants as Classifiers , 2012, CAV.

[18]  Rahul Sharma,et al.  Termination proofs from tests , 2013, ESEC/FSE 2013.

[19]  Ilias Diakonikolas,et al.  Proceedings of the 33rd International Conference on Machine Learning (ICML 2016) , 2016 .

[20]  Matthew B. Dwyer,et al.  Proceedings of the 30th international conference on Software engineering , 2008, ICSE 2008.

[21]  Charles A. Sutton,et al.  A Convolutional Attention Network for Extreme Summarization of Source Code , 2016, ICML.

[22]  Miltiadis Allamanis,et al.  Proceedings of the 22Nd ACM SIGSOFT International Symposium on Foundations of Software Engineering , 2014 .

[23]  Alexander Aiken,et al.  Verification as Learning Geometric Concepts , 2013, SAS.

[24]  Mirella Lapata,et al.  Language to Logical Form with Neural Attention , 2016, ACL.

[25]  Jason Weston,et al.  Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.

[26]  Jason Weston,et al.  Tracking the World State with Recurrent Entity Networks , 2016, ICLR.

[27]  Mingjun Zhong,et al.  Advances in Neural Information Processing Systems 28 (NIPS 2015) , 2014 .

[28]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[29]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[30]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[31]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[32]  Anh Tuan Nguyen,et al.  Combining Deep Learning with Information Retrieval to Localize Buggy Files for Bug Reports (N) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[33]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[34]  Swarat Chaudhuri,et al.  Dynamic inference of likely data preconditions over predicates by tree learning , 2008, ISSTA '08.

[35]  Martin White,et al.  Toward Deep Learning Software Repositories , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[36]  Charles A. Sutton,et al.  Learning natural coding conventions , 2014, SIGSOFT FSE.

[37]  Rahul Gupta,et al.  DeepFix: Fixing Common C Language Errors by Deep Learning , 2017, AAAI.

[38]  Sriram Sankaranarayanan,et al.  Mining library specifications using inductive logic programming , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.