Towards Making Deep Learning-based Vulnerability Detectors Robust

Automatically detecting software vulnerabilities in source code is an important problem that has attracted much attention. In particular, deep learning-based vulnerability detectors, or DL-based detectors, are attractive because they do not need human experts to define features or patterns of vulnerabilities. However, such detectors’ robustness is unclear. In this paper, we initiate the study in this aspect by demonstrating that DL-based detectors are not robust against simple code transformations, dubbed attacks in this paper, as these transformations may be leveraged for malicious purposes. As a first step towards making DL-based detectors robust against such attacks, we propose an innovative framework, dubbed ZigZag, which is centered at (i) decoupling feature learning and classifier learning and (ii) using a ZigZag-style strategy to iteratively refine them until they converge to robust features and robust classifiers. Experimental results show that the ZigZag framework can substantially improve the robustness of DL-based detectors.

[1]  Ahmad-Reza Sadeghi,et al.  K-Miner: Uncovering Memory Corruption in Linux , 2018, NDSS.

[2]  Tim Sonnekalb Machine-learning supported vulnerability detection in source code , 2019, ESEC/SIGSOFT FSE.

[3]  Abhik Roychoudhury,et al.  Directed Greybox Fuzzing , 2017, CCS.

[4]  Konrad Rieck,et al.  Misleading Authorship Attribution of Source Code using Adversarial Learning , 2019, USENIX Security Symposium.

[5]  Jean-Pierre Seifert,et al.  Towards Vulnerability Discovery Using Staged Program Analysis , 2015, DIMVA.

[6]  Hyoungshick Kim,et al.  COAT: Code Obfuscation Tool to Evaluate the Performance of Code Plagiarism Detection Tools , 2017, 2017 International Conference on Software Security and Assurance (ICSSA).

[7]  Chanchal Kumar Roy,et al.  A Mutation/Injection-Based Automatic Framework for Evaluating Code Clone Detection Tools , 2009, 2009 International Conference on Software Testing, Verification, and Validation Workshops.

[8]  Konrad Rieck,et al.  Automatic Inference of Search Patterns for Taint-Style Vulnerabilities , 2015, 2015 IEEE Symposium on Security and Privacy.

[9]  Gary McGraw,et al.  ITS4: a static vulnerability scanner for C and C++ code , 2000, Proceedings 16th Annual Computer Security Applications Conference (ACSAC'00).

[10]  Shouhuai Xu,et al.  SoK: Arms Race in Adversarial Malware Detection , 2020, ArXiv.

[11]  Onur Ozdemir,et al.  Automated Vulnerability Detection in Source Code Using Deep Representation Learning , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[12]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[13]  Shouhuai Xu,et al.  SySeVR: A Framework for Using Deep Learning to Detect Software Vulnerabilities , 2018, IEEE Transactions on Dependable and Secure Computing.

[14]  Wei Luo,et al.  Cross-Project Transfer Representation Learning for Vulnerable Function Discovery , 2018, IEEE Transactions on Industrial Informatics.

[15]  Shouhuai Xu,et al.  VulDeePecker: A Deep Learning-Based System for Vulnerability Detection , 2018, NDSS.

[16]  Anas Abdin,et al.  Empirical Evaluation of the Impact of Object-Oriented Code Refactoring on Quality Attributes: A Systematic Literature Review , 2018, IEEE Transactions on Software Engineering.

[17]  Wei Xiao,et al.  Deep Learning-Based Vulnerable Function Detection: A Benchmark , 2019, ICICS.

[18]  Mohammad Amin Alipour,et al.  Testing Neural Program Analyzers , 2019 .

[19]  Timofey Bryksin,et al.  PathMiner: A Library for Mining of Path-Based Representations of Code , 2019, 2019 IEEE/ACM 16th International Conference on Mining Software Repositories (MSR).

[20]  Yizheng Chen,et al.  On Training Robust PDF Malware Classifiers , 2019, USENIX Security Symposium.

[21]  Aws Albarghouthi,et al.  Robustness to Programmable String Transformations via Augmented Abstract Training , 2020, ICML.

[22]  Andreas Zeller,et al.  Predicting vulnerable software components , 2007, CCS '07.

[23]  Chao Zhang,et al.  CollAFL: Path Sensitive Fuzzing , 2018, 2018 IEEE Symposium on Security and Privacy (SP).

[24]  Uri Alon,et al.  code2vec: learning distributed representations of code , 2018, Proc. ACM Program. Lang..

[25]  Oscar R. Hernandez,et al.  HERCULES: A Pattern Driven Code Transformation System , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[26]  J. Doug Tygar,et al.  Adversarial machine learning , 2019, AISec '11.

[27]  Shin Hwei Tan,et al.  Combining Graph-Based Learning With Automated Data Collection for Code Vulnerability Detection , 2021, IEEE Transactions on Information Forensics and Security.

[28]  Ting Wang,et al.  TextBugger: Generating Adversarial Text Against Real-world Applications , 2018, NDSS.

[29]  Günter Kniesel,et al.  Static composition of refactorings , 2004, Sci. Comput. Program..

[30]  Fabian Yamaguchi,et al.  Pattern-Based Vulnerability Discovery , 2015 .

[31]  Qianmu Li,et al.  Adversarial Deep Ensemble: Evasion Attacks and Defenses for Malware Detection , 2020, IEEE Transactions on Information Forensics and Security.

[32]  Colin Raffel,et al.  Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition , 2019, ICML.

[33]  Lei Ma,et al.  Generating Adversarial Examples for Holding Robustness of Source Code Processing Models , 2020, AAAI.

[34]  Ian J. Goodfellow Defense Against the Dark Arts: An overview of adversarial example security research and future research directions , 2018, ArXiv.

[35]  Pan He,et al.  Adversarial Examples: Attacks and Defenses for Deep Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[36]  Shangqing Liu,et al.  Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks , 2019, NeurIPS.

[37]  David Brumley,et al.  ReDeBug: Finding Unpatched Code Clones in Entire OS Distributions , 2012, 2012 IEEE Symposium on Security and Privacy.

[38]  Mauricio A. Saca Refactoring improving the design of existing code , 2017, 2017 IEEE 37th Central America and Panama Convention (CONCAPAN XXXVII).

[39]  Воробьев Антон Александрович Анализ уязвимостей вычислительных систем на основе алгебраических структур и потоков данных National Vulnerability Database , 2013 .

[40]  Lizhen Qu,et al.  Deep Domain Adaptation for Vulnerable Code Function Identification , 2019, 2019 International Joint Conference on Neural Networks (IJCNN).

[41]  Martin Vechev,et al.  Adversarial Robustness for Code , 2020, ICML.

[42]  Olfat M. Mirza Style analysis for source code plagiarism detection , 2015 .

[43]  Konrad Rieck,et al.  Generalized vulnerability extrapolation using abstract syntax trees , 2012, ACSAC '12.

[44]  Marcus Pendleton,et al.  A Survey on Systems Security Metrics , 2016, ACM Comput. Surv..

[45]  Uri Alon,et al.  Adversarial examples for models of code , 2020, Proc. ACM Program. Lang..

[46]  Sang Peter Chin,et al.  Learning to Repair Software Vulnerabilities with Generative Adversarial Networks , 2018, NeurIPS.

[47]  Heejo Lee,et al.  VUDDY: A Scalable Approach for Vulnerable Code Clone Discovery , 2017, 2017 IEEE Symposium on Security and Privacy (SP).

[48]  Guillermo L. Grinblat,et al.  Toward Large-Scale Vulnerability Discovery using Machine Learning , 2016, CODASPY.

[49]  Choongwoo Han,et al.  The Art, Science, and Engineering of Fuzzing: A Survey , 2018, IEEE Transactions on Software Engineering.

[50]  Shouhuai Xu,et al.  $\mu$μVulDeePecker: A Deep Learning-Based System for Multiclass Vulnerability Detection , 2021, IEEE Transactions on Dependable and Secure Computing.

[51]  Shouhuai Xu,et al.  VulPecker: an automated vulnerability detection system based on code similarity analysis , 2016, ACSAC.

[52]  Mihai Christodorescu,et al.  COSET: A Benchmark for Evaluating Neural Program Embeddings , 2019, ArXiv.

[53]  Lizhen Qu,et al.  CD-VulD: Cross-Domain Vulnerability Discovery Based on Deep Domain Adaptation , 2022, IEEE Transactions on Dependable and Secure Computing.

[54]  Xiang Gao,et al.  Fuzz Testing based Data Augmentation to Improve Robustness of Deep Neural Networks , 2020, 2020 IEEE/ACM 42nd International Conference on Software Engineering (ICSE).

[55]  Larry S. Davis,et al.  Adversarial Training for Free! , 2019, NeurIPS.

[56]  Mathias Payer,et al.  FuzzGen: Automatic Fuzzer Generation , 2020, USENIX Security Symposium.

[57]  Jun Zhang,et al.  POSTER: Vulnerability Discovery with Function Representation Learning from Unlabeled Projects , 2017, CCS.

[58]  Shouling Ji,et al.  VulSniper: Focus Your Attention to Shoot Fine-Grained Vulnerabilities , 2019, IJCAI.