PatchRNN: A Deep Learning-Based System for Security Patch Identification

With the increasing usage of open-source software (OSS) components, vulnerabilities embedded within them are propagated to a huge number of underlying applications. In practice, the timely application of security patches in downstream software is challenging. The main reason is that such patches do not explicitly indicate their security impacts in the documentation, which would be difficult to recognize for software maintainers and users. However, attackers can still identify these “secret” security patches by analyzing the source code and generate corresponding exploits to compromise not only unpatched versions of the current software, but also other similar software packages that may contain the same vulnerability due to code cloning or similar design/implementation logic. Therefore, it is critical to identify these secret security patches to enable timely fixes. To this end, we propose a deep learning-based defense system called PatchRNN to automatically identify secret security patches in OSS. Besides considering descriptive keywords in the commit message (i.e., at the text level), we leverage both syntactic and semantic features at the source-code level. To evaluate the performance of our system, we apply it on a large-scale real-world patch dataset and conduct a case study on a popular open-source web server software - NGINX. Experimental results show that the PatchRNN can successfully detect secret security patches with a low false positive rate.

[1]  David Lo,et al.  A Deeper Look into Bug Fixes: Patterns, Replacements, Deletions, and Additions , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[2]  Rongrong Ji,et al.  Lattice-Based Recurrent Neural Network Encoders for Neural Machine Translation , 2016, AAAI.

[3]  Sushil Jajodia,et al.  A Machine Learning Approach to Classify Security Patches into Vulnerability Types , 2020, 2020 IEEE Conference on Communications and Network Security (CNS).

[4]  Min Yang,et al.  PDiff: Semantic-based Patch Presence Testing for Downstream Kernels , 2020, CCS.

[5]  Giovanni Vigna,et al.  SPIDER: Enabling Fast Patch Propagation In Related Software Repositories , 2020, 2020 IEEE Symposium on Security and Privacy (SP).

[6]  Zhendong Su,et al.  An Empirical Study on Real Bug Fixes , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[7]  Ahmed E. Hassan,et al.  Security versus performance bugs: a case study on Firefox , 2011, MSR '11.

[8]  David Lo,et al.  Identifying Linux bug fixing patches , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[9]  Yang Liu,et al.  SPAIN: Security Patch Analysis for Binaries towards Understanding the Pain and Pills , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[10]  Md. Rayhanur Rahman,et al.  Security and Performance Bug Reports Identification with Class-Imbalance Sampling and Feature Selection , 2018, 2018 Joint 7th International Conference on Informatics, Electronics & Vision (ICIEV) and 2018 2nd International Conference on Imaging, Vision & Pattern Recognition (icIVPR).

[11]  Sushil Jajodia,et al.  Detecting "0-Day" Vulnerability: An Empirical Study of Secret Security Patch in OSS , 2019, 2019 49th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN).

[12]  Katerina Goseva-Popstojanova,et al.  Identification of Security Related Bug Reports via Text Mining Using Supervised and Unsupervised Classification , 2018, 2018 IEEE International Conference on Software Quality, Reliability and Security (QRS).

[13]  Gabriele Bavota,et al.  Mining Unstructured Data in Software Repositories: Current and Future Trends , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[14]  Vern Paxson,et al.  A Large-Scale Empirical Study of Security Patches , 2017, CCS.

[15]  Shouhuai Xu,et al.  VulDeePecker: A Deep Learning-Based System for Vulnerability Detection , 2018, NDSS.

[16]  Yaqin Zhou,et al.  Automated identification of security issues from commit messages and bug reports , 2017, ESEC/SIGSOFT FSE.

[17]  Hang Zhang,et al.  Precise and Accurate Patch Presence Test for Binaries , 2018, USENIX Security Symposium.

[18]  Abram Hindle,et al.  Judging a commit by its cover; or can a commit message predict build failure? , 2016, PeerJ Prepr..

[19]  Matthew Smith,et al.  VCCFinder: Finding Potential Vulnerabilities in Open-Source Projects to Assist Code Audits , 2015, CCS.

[20]  Alok Kumar,et al.  Identifying Security Bug Reports Based Solely on Report Titles and Noisy Data , 2019, 2019 IEEE International Conference on Smart Computing (SMARTCOMP).