Code Obfuscation and Malware Detection by Abstract Interpretation

Functions We already observed in Section 5.2.2 that a function f : Z → Z is decomposed into elementary functions, i.e., assembly instructions within some basic block. Following the same approach, let us assume that the function f can be expressed as a composition of elementary functions, namely f = λx.h(g1(x, ..., x), ..., gk(x, ..., x)) where h : Zk → Z and gi : Zni → Z. More in general, each gi can be further decomposed into elementary functions. For example, f(x) = x2 + x is decomposed as h(g1(x), g2(x)) where h(x, y) = x+ y, g1(x) = x 2 and g2(x) = x. Let us consider the pointwise extensions of the elementary functions, which are still denoted, with a slight abuse of notation, by h : ℘(Z)k → ℘(Z) and gi : ℘(Z)ni → ℘(Z), and let us denote their composition by F def = λX.h(g1(X, ...,X), ..., gk (X, ...,X)) : ℘(Z) → ℘(Z) For example, for the above decomposition f(x) = x2 + x = h(g1(x), g2(x)), we have that F : ℘(Z) → ℘(Z) is as follows: F (X) = {y2 + z | y, z ∈ X}. Observe that F does not coincide with the pointwise extension f of f , e.g., F ({1, 2}) = {2, 3, 5, 6} while f({1, 2}) = {2, 6}. Let us also notice that F on singletons coincides with f , namely for any x ∈ Z, F ({x}) = f(x). Thus, the concrete test CT can be equivalently formulated as ∀x ∈ Z : F ({x}) ⊆ nZ. Let A ∈ uco(℘(Z)) be an abstract domain such that there exists some an ∈ A with γA(an) = nZ. The attacker A approximates the computation of function F : ℘(Z) → ℘(Z) in a step by step fashion, meaning that A approximates every elementary function composing F . Thus, the abstract function F ♯ : A → A is defined as the composition of the best correct approximations h and g i on A of the elementary functions, namely: F (a) def = αA(h(γA(αA(g1(γA(a), ..., γA(a)))), ..., γA(αA(gk(γA(a), ..., γA(a)))))) = h(g i (a), ..., g A k (a)) 106 5 Control Code Obfuscation When the abstract test AT ♯ A for F ♯ on A holds, the attacker modeled by the abstract domain A classifies the predicate n|f(x) as opaque. It turns out that F ♯ is a correct approximation of F on A, namely αA ◦ F ⊑A F ♯ ◦ αA, and this guarantees the soundness of the abstract test AT ♯ A . Corollary 5.8. AT ♯ A is sound. proof: We first show that F ♯ : A → A is a sound approximation of F : ℘(Z) → ℘(Z), namely ∀X ∈ ℘(Z) : αA(F (X)) ≤A F (αA(X)). In fact for any X ∈ ℘(Z): αA(F (X)) = αA(h(g1(X, ..., X), ..., gk(X, ..., X))) ≤A αA(h(γA(g1(X, ..., X), ..., γA(gk(X, ..., X))))) ≤A αA(h(γA(αA(g1(γA(αA(X)), ..., γA(αA(X))))), ..., γA(αA(gk(γA(αA(X)), ..., γA(αA(X))))))))

[1]  A. Turing On Computable Numbers, with an Application to the Entscheidungsproblem. , 1937 .

[2]  Morgan Ward,et al.  The Closure Operators of a Lattice , 1942 .

[3]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[4]  Matthew S. Hecht,et al.  Flow Analysis of Computer Programs , 1977 .

[5]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[6]  Nicolas Halbwachs,et al.  Automatic discovery of linear restraints among variables of a program , 1978, POPL.

[7]  G. Grätzer General Lattice Theory , 1978 .

[8]  Patrick Cousot,et al.  Systematic design of program analysis frameworks , 1979, POPL.

[9]  Warren A. Harrison,et al.  A complexity measure based on nesting level , 1981, SIGP.

[10]  Mark Weiser,et al.  Program Slicing , 1981, IEEE Transactions on Software Engineering.

[11]  Krzysztof R. Apt,et al.  Countable nondeterminism and random assignment , 1986, JACM.

[12]  Nader Bagherzadeh,et al.  Software Authorization Systems , 1986, IEEE Software.

[13]  Amir Herzberg,et al.  Public protection of software , 1985, TOCS.

[14]  Fred Cohen,et al.  Computer viruses—theory and experiments , 1990 .

[15]  Leonard M. Adleman,et al.  An Abstract Theory of Computer Viruses , 1988, CRYPTO.

[16]  Brian A. Davey,et al.  An Introduction to Lattices and Order , 1989 .

[17]  Fred Cohen,et al.  Computational aspects of computer viruses , 1989, Comput. Secur..

[18]  Steven R. Snapp,et al.  The DIDS (Distributed Intrusion Detection System) Prototype , 1992, USENIX Summer.

[19]  Olivier Danvy,et al.  Tutorial notes on partial evaluation , 1993, POPL '93.

[20]  Taghi M. Khoshgoftaar,et al.  Measurement of data structure complexity , 1993, J. Syst. Softw..

[21]  Frederick B. Cohen,et al.  Operating system protection through program evolution , 1993, Comput. Secur..

[22]  G. Ramalingam,et al.  The undecidability of aliasing , 1994, TOPL.

[23]  Eugene H. Spafford,et al.  A PATTERN MATCHING MODEL FOR MISUSE INTRUSION DETECTION , 1994 .

[24]  Neil D. Jones,et al.  An introduction to partial evaluation , 1996, CSUR.

[25]  Gilberto Filé,et al.  Complementation of Abstract Domains made Easy , 1996, JICSLP.

[26]  Robert Paige,et al.  Future directions in program transformations , 1996, CSUR.

[27]  Patrick Cousot,et al.  Abstract interpretation , 1996, CSUR.

[28]  David Aucsmith,et al.  Tamper Resistant Software: An Implementation , 1996, Information Hiding.

[29]  Stephanie Forrest,et al.  A sense of self for Unix processes , 1996, Proceedings 1996 IEEE Symposium on Security and Privacy.

[30]  Christian S. Collberg,et al.  A Taxonomy of Obfuscating Transformations , 1997 .

[31]  Karl N. Levitt,et al.  Execution monitoring of security-critical programs in distributed systems: a specification-based approach , 1997, Proceedings. 1997 IEEE Symposium on Security and Privacy (Cat. No.97CB36097).

[32]  Hongji Yang,et al.  Reverse Engineering and Reusing COBOL Programs: A Program Transformation Approach , 1997, IWFM.

[33]  Gary McGraw,et al.  Genetic algorithms for dynamic test data generation , 1997, Proceedings 12th IEEE International Conference Automated Software Engineering.

[34]  Agostino Cortesi,et al.  Complementation in abstract interpretation , 1997, TOPL.

[35]  Marcus J. Ranum,et al.  Implementing a generalized tool for network monitoring , 1997, Inf. Secur. Tech. Rep..

[36]  Carey Nachenberg,et al.  Computer virus-antivirus coevolution , 1997, Commun. ACM.

[37]  David H. Ackley,et al.  Building diverse computer systems , 1997, Proceedings. The Sixth Workshop on Hot Topics in Operating Systems (Cat. No.97TB100133).

[38]  Salvatore J. Stolfo,et al.  Data Mining Approaches for Intrusion Detection , 1998, USENIX Security Symposium.

[39]  Clark Thomborson,et al.  Manufacturing cheap, resilient, and stealthy opaque constructs , 1998, POPL '98.

[40]  Fritz Hohl,et al.  Time Limited Blackbox Security: Protecting Mobile Agents From Malicious Hosts , 1998, Mobile Agents and Security.

[41]  Amir Pnueli,et al.  The Code Validation Tool (CVT) , 1998, International Journal on Software Tools for Technology Transfer (STTT).

[42]  Flemming Nielson,et al.  Principles of Program Analysis , 1999, Springer Berlin Heidelberg.

[43]  E. Amoroso Intrusion Detection , 1999 .

[44]  Salvatore J. Stolfo,et al.  A data mining framework for building intrusion detection models , 1999, Proceedings of the 1999 IEEE Symposium on Security and Privacy (Cat. No.99CB36344).

[45]  Salvatore J. Stolfo,et al.  A Data Mining and CIDF Based Approach for Detecting Novel and Distributed Intrusions , 2000, Recent Advances in Intrusion Detection.

[46]  Brian Randell,et al.  Fundamental Concepts of Dependability , 2000 .

[47]  Steve R. White,et al.  An Undetectable Computer Virus , 2000 .

[48]  Jens Palsberg,et al.  Experience with software watermarking , 2000, Proceedings 16th Annual Computer Security Applications Conference (ACSAC'00).

[49]  Roberto Giacobazzi,et al.  Making abstract interpretations complete , 2000, JACM.

[50]  Jules Desharnais,et al.  Static Detection of Malicious Code in Executable Programs , 2000 .

[51]  Juraj Hromkovic,et al.  Algorithmics for Hard Problems , 2002, Texts in Theoretical Computer Science An EATCS Series.

[52]  Yuan Xiang Gu,et al.  An Approach to the Obfuscation of Control-Flow of Sequential Computer Programs , 2001, ISC.

[53]  John McHugh,et al.  Intrusion and intrusion detection , 2001, International Journal of Information Security.

[54]  John C. Knight,et al.  A security architecture for survivability mechanisms , 2001 .

[55]  Peter Szor,et al.  HUNTING FOR METAMORPHIC , 2001 .

[56]  Mikhail J. Atallah,et al.  Protecting Software Code by Guards , 2001, Digital Rights Management Workshop.

[57]  Patrick Cousot Constructive design of a hierarchy of semantics of a transition system by abstract interpretation , 2002, Theor. Comput. Sci..

[58]  Patrick Cousot,et al.  Systematic design of program transformation frameworks by abstract interpretation , 2002, POPL '02.

[59]  Christian S. Collberg,et al.  Watermarking, Tamper-Proofing, and Obfuscation-Tools for Software Protection , 2002, IEEE Trans. Software Eng..

[60]  Somesh Jha,et al.  Static Analysis of Executables to Detect Malicious Patterns , 2003, USENIX Security Symposium.

[61]  Saumya K. Debray,et al.  Obfuscation of executable code to improve resistance to static disassembly , 2003, CCS '03.

[62]  R. Sekar,et al.  An Approach for Detecting Self-propagating Email Using Anomaly Detection , 2003, RAID.

[63]  B. Karp,et al.  Autograph: Toward Automated, Distributed Worm Signature Detection , 2004, USENIX Security Symposium.

[64]  Stephen Drape,et al.  Obfuscation of abstract data-types , 2004 .

[65]  Somesh Jha,et al.  Testing malware detectors , 2004, ISSTA '04.

[66]  Christian S. Collberg,et al.  The Obfuscation Executive , 2004, ISC.

[67]  Marcus A. Maloof,et al.  Learning to detect malicious executables in the wild , 2004, KDD.

[68]  Amit Sahai,et al.  Positive Results and Techniques for Obfuscation , 2004, EUROCRYPT.

[69]  Stefan Katzenbeisser,et al.  Detecting Malicious Code by Model Checking , 2005, DIMVA.

[70]  Somesh Jha,et al.  Semantics-aware malware detection , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[71]  Koen De Bosschere,et al.  Hybrid static-dynamic attacks against software protection mechanisms , 2005, DRM '05.

[72]  Yael Tauman Kalai,et al.  On the impossibility of obfuscation with auxiliary input , 2005, 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS'05).

[73]  James Newsome,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and SignatureGeneration of Exploits on Commodity Software , 2005, NDSS.

[74]  Peng Ning,et al.  Automatic diagnosis and response to memory corruption vulnerabilities , 2005, CCS '05.

[75]  James Newsome,et al.  Polygraph: automatically generating signatures for polymorphic worms , 2005, 2005 IEEE Symposium on Security and Privacy (S&P'05).

[76]  Hoeteck Wee,et al.  On obfuscating point functions , 2005, STOC '05.

[77]  Saumya K. Debray,et al.  Deobfuscation: reverse engineering obfuscated code , 2005, 12th Working Conference on Reverse Engineering (WCRE'05).

[78]  Roberto Giacobazzi,et al.  Control code obfuscation by abstract interpretation , 2005, Third IEEE International Conference on Software Engineering and Formal Methods (SEFM'05).

[79]  Koen De Bosschere,et al.  Software Protection Through Dynamic Code Mutation , 2005, WISA.

[80]  Zhenkai Liang,et al.  Fast and automated generation of attack signatures: a basis for building self-protecting servers , 2005, CCS '05.

[81]  Peter Szor,et al.  The Art of Computer Virus Research and Defense , 2005 .

[82]  Jianying Zhou,et al.  Theoretical basis for intrusion detection , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.

[83]  Fei-Yue Wang,et al.  Obfuscate arrays by homomorphic functions , 2006, 2006 IEEE International Conference on Granular Computing.

[84]  Christian S. Collberg,et al.  Software watermarking via opaque predicates: Implementation, analysis, and attacks , 2006, Electron. Commer. Res..

[85]  Koen De Bosschere,et al.  Opaque Predicates Detection by Abstract Interpretation , 2006, AMAST.

[86]  Hao Wang,et al.  Towards automatic generation of vulnerability-based signatures , 2006, 2006 IEEE Symposium on Security and Privacy (S&P'06).

[87]  Fred B. Schneider,et al.  Independence from obfuscation: a semantic framework for diversity , 2006, 19th IEEE Computer Security Foundations Workshop (CSFW'06).

[88]  Antoine Miné,et al.  The octagon abstract domain , 2001, High. Order Symb. Comput..

[89]  Koen De Bosschere,et al.  LOCO: an interactive code (De)obfuscation tool , 2006, PEPM '06.

[90]  Gregory R. Andrews,et al.  PLTO: A Link-Time Optimizer for the Intel IA-32 Architecture , 2007 .

[91]  Somesh Jha,et al.  A semantics-based approach to malware detection , 2007, POPL '07.