Abstract Symbolic Automata: Mixed syntactic/semantic similarity analysis of executables

We introduce a model for mixed syntactic/semantic approximation of programs based on symbolic finite automata (SFA). The edges of SFA are labeled by predicates whose semantics specifies the denotations that are allowed by the edge. We introduce the notion of abstract symbolic finite automaton (ASFA) where approximation is made by abstract interpretation of symbolic finite automata, acting both at syntactic (predicate) and semantic (denotation) level. We investigate in the details how the syntactic and semantic abstractions of SFA relate to each other and contribute to the determination of the recognized language. Then we introduce a family of transformations for simplifying ASFA. We apply this model to prove properties of commonly used tools for similarity analysis of binary executables. Following the structure of their control flow graphs, disassembled binary executables are represented as (concrete) SFA, where states are program points and predicates represent the (possibly infinite) I/O semantics of each basic block in a constraint form. Known tools for binary code analysis are viewed as specific choices of symbolic and semantic abstractions in our framework, making symbolic finite automata and their abstract interpretations a unifying model for comparing and reasoning about soundness and completeness of analyses of low-level code.

[1]  Thomas W. Reps,et al.  Symbolic Implementation of the Best Transformer , 2004, VMCAI.

[2]  Roberto Giacobazzi,et al.  An abstract interpretation-based model for safety semantics , 2011, Int. J. Comput. Math..

[3]  Halvar Flake,et al.  Structural Comparison of Executable Objects , 2004, DIMVA.

[4]  S. L. Édel'man Closure operators on a lattice , 1980 .

[5]  Roberto Giacobazzi,et al.  Making abstract interpretations complete , 2000, JACM.

[6]  Isabella Mastroeni,et al.  The PER Model of Abstract Non-interference , 2005, SAS.

[7]  Patrick Cousot,et al.  Systematic design of program analysis frameworks , 1979, POPL.

[8]  Arun Lakhotia,et al.  Fast location of similar code fragments using semantic 'juice' , 2013, PPREW '13.

[9]  Loris D'Antoni,et al.  Equivalence of Extended Symbolic Finite Transducers , 2013, CAV.

[10]  Patrick Cousot,et al.  Verification by Abstract Interpretation , 2003, Verification: Theory and Practice.

[11]  Roberto Giacobazzi,et al.  Modelling Metamorphism by Abstract Interpretation , 2010, SAS.

[12]  Nikolaj Bjørner,et al.  Symbolic finite state transducers: algorithms and applications , 2012, POPL '12.

[13]  Jr. Hartley Rogers Theory of Recursive Functions and Effective Computability , 1969 .

[14]  Patrick Cousot,et al.  Theories, solvers and static analysis by abstract interpretation , 2012, JACM.

[15]  Debin Gao,et al.  BinHunt: Automatically Finding Semantic Differences in Binary Programs , 2008, ICICS.

[16]  Todd Millstein,et al.  Automatic predicate abstraction of C programs , 2001, PLDI '01.

[17]  Loris D'Antoni,et al.  Minimization of symbolic automata , 2014, POPL.

[18]  Thomas W. Reps,et al.  Bilateral Algorithms for Symbolic Abstraction , 2012, SAS.

[19]  Patrick Cousot,et al.  Formal language, grammar and set-constraint-based program analysis by abstract interpretation , 1995, FPCA '95.

[20]  Cormac Flanagan,et al.  Predicate abstraction for software verification , 2002, POPL '02.

[21]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[22]  Leopold Carl Robert Haller,et al.  Abstract satisfaction , 2014, POPL.

[23]  Zohar Manna,et al.  Verification : theory and practice : essays dedicated to Zohar Manna on the occasion of his 64th birthday , 2004 .

[24]  Bertrand Jeannet,et al.  Lattice Automata: A Representation for Languages on Infinite Alphabets, and Some Applications to Verification , 2007, SAS.