Unacceptable Behavior: Robust PDF Malware Detection Using Abstract Interpretation

The popularity of the PDF format and the rich JavaScript environment that PDF viewers offer make PDF documents an attractive attack vector for malware developers. PDF documents present a serious threat to the security of organizations because most users are unsuspecting of them and thus likely to open documents from untrusted sources. State-of-the-art approaches use machine learning to learn features that characterize PDF malware, which makes them subject to adversarial attacks that mimic the structure of benign documents. In this paper, we instead propose to detect malicious code inside a PDF by statically reasoning about its possible behavior using abstract interpretation. A comparison with state-of-the-art PDF malware detection tools shows that our conservative abstract interpretation approach achieves similar accuracy, is more resilient to evasion attacks, and provides interpretable reports.

[1]  Jeffrey D. Ullman,et al.  Introduction to automata theory, languages, and computation, 2nd edition , 2001, SIGA.

[2]  Angelos Stavrou,et al.  When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors , 2016, NDSS.

[3]  Somesh Jha,et al.  A semantics-based approach to malware detection , 2007, POPL '07.

[4]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[5]  Giorgio Giacinto,et al.  A Pattern Recognition System for Malicious PDF Files Detection , 2012, MLDM.

[6]  Marco Pistoia,et al.  Saving the world wide web from vulnerable JavaScript , 2011, ISSTA '11.

[7]  Elmar Gerhards-Padilla,et al.  PDF Scrutinizer: Detecting JavaScript-based attacks in PDF documents , 2012, 2012 Tenth Annual International Conference on Privacy, Security and Trust.

[8]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[9]  Xun Lu,et al.  De-obfuscation and Detection of Malicious PDF Files with High Accuracy , 2013, 2013 46th Hawaii International Conference on System Sciences.

[10]  Thomas R. Gross,et al.  Easy to Fool? Testing the Anti-evasion Capabilities of PDF Malware Scanners , 2019, ArXiv.

[11]  Niels Provos,et al.  SHELLOS: Enabling Fast Detection and Forensic Analysis of Code Injection Attacks , 2011, USENIX Security Symposium.

[12]  Sukyoung Ryu,et al.  SAFE: Formal Specification and Implementation of a Scalable Analysis Framework for ECMAScript , 2012 .

[13]  Peter Thiemann,et al.  Type Analysis for JavaScript , 2009, SAS.

[14]  Flemming Nielson,et al.  International Workshop on Principles of Program Analysis , 1999 .

[15]  Yanjun Qi,et al.  Automatically Evading Classifiers: A Case Study on PDF Malware Classifiers , 2016, NDSS.

[16]  Angelos Stavrou,et al.  Malicious PDF detection using metadata and structural features , 2012, ACSAC '12.

[17]  Olivier Levillain,et al.  Caradoc: A Pragmatic Approach to PDF Parsing and Validation , 2016, 2016 IEEE Security and Privacy Workshops (SPW).

[18]  Razvan Benchea,et al.  A practical approach on clustering malicious PDF documents , 2012, Journal in Computer Virology.

[19]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[20]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[21]  Pavel Laskov,et al.  Practical Evasion of a Learning-Based Classifier: A Case Study , 2014, 2014 IEEE Symposium on Security and Privacy.

[22]  Gang Wang,et al.  LEMNA: Explaining Deep Learning based Security Applications , 2018, CCS.

[23]  Flemming Nielson,et al.  Principles of Program Analysis , 1999, Springer Berlin Heidelberg.

[24]  Felix C. Freiling,et al.  Using memory management to detect and extract illegitimate code for malware analysis , 2012, ACSAC '12.

[25]  Angelos Stavrou,et al.  Detecting Malicious Javascript in PDF through Document Instrumentation , 2014, 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks.

[26]  Benjamin Livshits,et al.  Rozzle: De-cloaking Internet Malware , 2012, 2012 IEEE Symposium on Security and Privacy.

[27]  Alistair A. Young,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2017, MICCAI 2017.

[28]  Cindy Eisner,et al.  Accurate Malware Detection by Extreme Abstraction , 2018, ACSAC.

[29]  Giorgio Giacinto,et al.  Looking at the bag is not enough to find the bomb: an evasion of structural methods for malicious PDF files detection , 2013, ASIA CCS '13.

[30]  Pavel Laskov,et al.  Static detection of malicious JavaScript-bearing PDF documents , 2011, ACSAC '11.

[31]  Mu Zhang,et al.  Extract Me If You Can: Abusing PDF Parsers in Malware Detectors , 2016, NDSS.

[32]  Amir Globerson,et al.  Nightmare at test time: robust learning by feature deletion , 2006, ICML.

[33]  Yuval Elovici,et al.  Keeping pace with the creation of new malicious PDF files using an active-learning based detection framework , 2016, Security Informatics.

[34]  Benjamin Livshits,et al.  ZOZZLE: Fast and Precise In-Browser JavaScript Malware Detection , 2011, USENIX Security Symposium.

[35]  Stefan Katzenbeisser,et al.  Detecting Malicious Code by Model Checking , 2005, DIMVA.

[36]  Meng Xu,et al.  PlatPal: Detecting Malicious Documents with Platform Diversity , 2017, USENIX Security Symposium.

[37]  Thomas R. Dean,et al.  Using clone detection to find malware in acrobat files , 2013, CASCON.

[38]  Christopher Krügel,et al.  A survey on automated dynamic malware-analysis techniques and tools , 2012, CSUR.

[39]  Wenke Lee,et al.  Evading network anomaly detection systems: formal reasoning and practical techniques , 2006, CCS '06.

[40]  Heng Yin,et al.  JSForce: A Forced Execution Engine for Malicious JavaScript Detection , 2017, SecureComm.

[41]  Yuval Elovici,et al.  Detection of malicious PDF files and directions for enhancements: A state-of-the art survey , 2015, Comput. Secur..

[42]  Benjamin Livshits,et al.  NOZZLE: A Defense Against Heap-spraying Code Injection Attacks , 2009, USENIX Security Symposium.

[43]  Yuval Elovici,et al.  ALPD: Active Learning Framework for Enhancing the Detection of Malicious PDF Files , 2014, 2014 IEEE Joint Intelligence and Security Informatics Conference.

[44]  Swarat Chaudhuri,et al.  Extraction of statistically significant malware behaviors , 2013, ACSAC.

[45]  Evangelos P. Markatos,et al.  Combining static and dynamic analysis for the detection of malicious documents , 2011, EUROSEC '11.

[46]  Pavel Laskov,et al.  Hidost: a static machine-learning-based detector of malicious files , 2016, EURASIP J. Inf. Secur..

[47]  C. Schade,et al.  FCScan: A New Lightweight and Effective Approach for Detecting Malicious Content in Electronic Documents , 2013 .

[48]  Pascal Frossard,et al.  Analysis of classifiers’ robustness to adversarial perturbations , 2015, Machine Learning.

[49]  Giorgio Giacinto,et al.  Lux0R: Detection of Malicious PDF-embedded JavaScript code through Discriminant Analysis of API References , 2014, AISec '14.

[50]  Fabio Roli,et al.  Evasion Attacks against Machine Learning at Test Time , 2013, ECML/PKDD.