Project Achilles: A Prototype Tool for Static Method-Level Vulnerability Detection of Java Source Code Using a Recurrent Neural Network

Software has become an essential component of modern life, but when software vulnerabilities threaten the security of users, new ways of analyzing for software security must be explored. Using the National Institute of Standards and Technology's Juliet Java Suite, containing thousands of examples of defective Java methods for a variety of vulnerabilities, a prototype tool was developed implementing an array of Long-Short Term Memory Recurrent Neural Networks to detect vulnerabilities within source code. The tool employs various data preparation methods to be independent of coding style and to automate the process of extracting methods, labeling data, and partitioning the dataset. The result is a prototype command-line utility that generates an n-dimensional vulnerability prediction vector. The experimental evaluation using 44,495 test cases indicates that the tool can achieve an accuracy higher than 90% for 24 out of 29 different types of CWE vulnerabilities.

[1]  Ohm Sornil,et al.  Malware Classification Using N-grams Sequential Pattern Features , 2013 .

[2]  Christos Faloutsos,et al.  Polonium: Tera-Scale Graph Mining and Inference for Malware Detection , 2011 .

[3]  Walid Maalej,et al.  Automatically Classifying Functional and Non-functional Requirements Using Supervised Machine Learning , 2017, 2017 IEEE 25th International Requirements Engineering Conference (RE).

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Xiaodong Gu,et al.  Deep API learning , 2016, SIGSOFT FSE.

[6]  Laurie A. Williams,et al.  One Technique is Not Enough: A Comparison of Vulnerability Discovery Techniques , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[7]  Kazi Zakia Sultana Towards a software vulnerability prediction model using traceable code patterns and software metrics , 2017, 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[8]  Greg Nelson,et al.  Extended static checking for Java , 2002, PLDI '02.

[9]  Matthew B. Dwyer,et al.  Bandera: a source-level interface for model checking Java programs , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[10]  Gary McGraw,et al.  Static Analysis for Security , 2004, IEEE Secur. Priv..

[11]  Sang Peter Chin,et al.  Automated software vulnerability detection with machine learning , 2018, ArXiv.

[12]  Vincent Gramoli,et al.  Vandal: A Scalable Security Analysis Framework for Smart Contracts , 2018, ArXiv.

[13]  Aditya K. Ghose,et al.  Automatic feature learning for vulnerability prediction , 2017, ArXiv.

[14]  Jeffrey S. Foster,et al.  A comparison of bug finding tools for Java , 2004, 15th International Symposium on Software Reliability Engineering.

[15]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[16]  Jane Cleland-Huang,et al.  The Detection and Classification of Non-Functional Requirements with Application to Early Aspects , 2006, 14th IEEE International Requirements Engineering Conference (RE'06).

[17]  David Hovemeyer,et al.  Finding bugs is easy , 2004, SIGP.

[18]  Shouhuai Xu,et al.  SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities , 2018, IEEE Transactions on Dependable and Secure Computing.

[19]  Onur Ozdemir,et al.  Automated Vulnerability Detection in Source Code Using Deep Representation Learning , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[20]  Shouhuai Xu,et al.  VulDeePecker: A Deep Learning-Based System for Vulnerability Detection , 2018, NDSS.

[21]  Denys Poshyvanyk,et al.  Discovering Flaws in Security-Focused Static Analysis Tools for Android using Systematic Mutation , 2018, USENIX Security Symposium.

[22]  Xiaodong Gu,et al.  Deep Code Search , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).