Generating Adversarial Computer Programs using Optimized Obfuscations

Machine learning (ML) models that learn and predict properties of computer programs are increasingly being adopted and deployed. In this work, we investigate principled ways to adversarially perturb a computer program to fool such learned models, and thus determine their adversarial robustness. We use program obfuscations, which have conventionally been used to avoid attempts at reverse engineering programs, as adversarial perturbations. These perturbations modify programs in ways that do not alter their functionality but can be crafted to deceive an ML model when making a decision. We provide a general formulation for an adversarial program that allows applying multiple obfuscation transformations to a program in any language. We develop first-order optimization algorithms to efficiently determine two key aspects – which parts of the program to transform, and what transformations to use. We show that it is important to optimize both these aspects to generate the best adversarially perturbed program. Due to the discrete nature of this problem, we also propose using randomized smoothing to improve the attack loss landscape to ease optimization. We evaluate our work on Python and Java programs on the problem of program summarization.1 We show that our best attack proposal achieves a 52% improvement over a state-of-the-art attack generation approach for programs trained on a SEQ2SEQ model. We further show that our formulation is better at training models that are robust to adversarial attacks.

[1]  Somesh Jha,et al.  Semantic Robustness of Models of Source Code , 2020, 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER).

[2]  Mohammad Amin Alipour,et al.  Evaluation of Generalizability of Neural Program Analyzers under Semantic-Preserving Transformations , 2020, ArXiv.

[3]  Lei Ma,et al.  Generating Adversarial Examples for Holding Robustness of Source Code Processing Models , 2020, AAAI.

[4]  Martin Vechev,et al.  Adversarial Robustness for Code , 2020, ICML.

[5]  Feargus Pendlebury,et al.  Intriguing Properties of Adversarial ML Attacks in the Problem Space , 2019, 2020 IEEE Symposium on Security and Privacy (SP).

[6]  Eran Yahav,et al.  Adversarial examples for models of code , 2019, Proc. ACM Program. Lang..

[7]  Shangqing Liu,et al.  Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks , 2019, NeurIPS.

[8]  Sijia Liu,et al.  Topology Attack and Defense for Graph Neural Networks: An Optimization Perspective , 2019, IJCAI.

[9]  Konrad Rieck,et al.  Misleading Authorship Attribution of Source Code using Adversarial Learning , 2019, USENIX Security Symposium.

[10]  Mihai Christodorescu,et al.  COSET: A Benchmark for Evaluating Neural Program Embeddings , 2019, ArXiv.

[11]  Omer Levy,et al.  code2seq: Generating Sequences from Structured Representations of Code , 2018, ICLR.

[12]  Logan Engstrom,et al.  Evaluating and Understanding the Robustness of Adversarial Logit Pairing , 2018, ArXiv.

[13]  Koushik Sen,et al.  DeepBugs: a learning approach to name-based bug detection , 2018, Proc. ACM Program. Lang..

[14]  Shouhuai Xu,et al.  VulDeePecker: A Deep Learning-Based System for Vulnerability Detection , 2018, NDSS.

[15]  Dejing Dou,et al.  HotFlip: White-Box Adversarial Examples for Text Classification , 2017, ACL.

[16]  Premkumar T. Devanbu,et al.  A Survey of Machine Learning for Big Code and Naturalness , 2017, ACM Comput. Surv..

[17]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[18]  Le Song,et al.  Learning Loop Invariants for Program Verification , 2018, NeurIPS.

[19]  Yu Jiang,et al.  Stochastic Optimization of Program Obfuscation , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[20]  Rahul Gupta,et al.  DeepFix: Fixing Common C Language Errors by Deep Learning , 2017, AAAI.

[21]  Sebastian Nowozin,et al.  DeepCoder: Learning to Write Programs , 2016, ICLR.

[22]  David A. Wagner,et al.  Towards Evaluating the Robustness of Neural Networks , 2016, 2017 IEEE Symposium on Security and Privacy (SP).

[23]  Martin T. Vechev,et al.  Probabilistic model for code with decision trees , 2016, OOPSLA.

[24]  Renaud Pawlak,et al.  SPOON: A library for implementing analyses and transformations of Java source code , 2016, Softw. Pract. Exp..

[25]  Charles A. Sutton,et al.  A Convolutional Attention Network for Extreme Summarization of Source Code , 2016, ICML.

[26]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[27]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[28]  Varun Aggarwal,et al.  A system to grade computer programming skills using machine learning , 2014, KDD.

[29]  Martin J. Wainwright,et al.  Randomized Smoothing for Stochastic Optimization , 2011, SIAM J. Optim..

[30]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[31]  Christian Blum,et al.  Metaheuristics in combinatorial optimization: Overview and conceptual comparison , 2003, CSUR.