P2C: Understanding Output Data Files via On-the-Fly Transformation from Producer to Consumer Executions

In cyber attack analysis, it is often highly desirable to understand the meaning of an unknown file or network message in the absence of their consumer (i.e. the program that parses and understands the file/message). For example, a malware may stealthily collect information from a victim machine, store them as a file and later send it to a remote server. P2C is a novel technique that can parse and understand unknown files and network messages. Given a file/message that was generated in the past without the presence of any monitoring techniques, and a set of potential producers of the file/message, P2C systematically explores the execution paths in the producers without requiring any inputs. In the mean time, it tries to transform a producer execution to a consumer execution that closely resembles the ideal consumer execution that can parse the given unknown file/message. In particular, when a write operation is encountered in the original execution, P2C performs the opposite read operation on the unknown file/message and patches the original execution with the loaded value. In order to handle correlations between data fields in the file/message, P2C follows a trial-and-error approach to look for the correct transformation until the file/message can be parsed and the meaning of their fields can be disclosed. Our experiments on a set of real world applications demonstrate P2C is highly effective.

[1]  Thomas W. Reps,et al.  Analyzing Memory Accesses in x86 Executables , 2004, CC.

[2]  Helen J. Wang,et al.  Tupni: automatic reverse engineering of input formats , 2008, CCS.

[3]  David Brumley,et al.  TIE: Principled Reverse Engineering of Types in Binary Programs , 2011, NDSS.

[4]  Thomas W. Reps,et al.  Extracting Output Formats from Executables , 2006, 2006 13th Working Conference on Reverse Engineering.

[5]  Christopher Krügel,et al.  Automatic Network Protocol Analysis , 2008, NDSS.

[6]  Thomas W. Reps,et al.  Improved Memory-Access Analysis for x86 Executables , 2008, CC.

[7]  Jonathon T. Giffin,et al.  2011 IEEE Symposium on Security and Privacy Virtuoso: Narrowing the Semantic Gap in Virtual Machine Introspection , 2022 .

[8]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.

[9]  Xiangyu Zhang,et al.  Automatic Reverse Engineering of Data Structures from Binary Execution , 2010, NDSS.

[10]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[11]  Zhi Wang,et al.  ReFormat: Automatic Reverse Engineering of Encrypted Messages , 2009, ESORICS.

[12]  Stephen McCamant,et al.  Binary Code Extraction and Interface Identification for Security Applications , 2009, NDSS.

[13]  Polyglot : Automatic Extraction of Protocol Format using Dynamic Binary Analysis , 2007 .

[14]  Fei Peng,et al.  X-Force: Force-Executing Binary Programs for Security Applications , 2014, USENIX Security Symposium.

[15]  Herbert Bos,et al.  Howard: A Dynamic Excavator for Reverse Engineering Data Structures , 2011, NDSS.

[16]  Christopher Krügel,et al.  Prospex: Protocol Specification Extraction , 2009, 2009 30th IEEE Symposium on Security and Privacy.

[17]  Thomas W. Reps,et al.  Checking conformance of a producer and a consumer , 2011, ESEC/FSE '11.

[18]  Xiangyu Zhang,et al.  Deriving input syntactic structure from execution , 2008, SIGSOFT '08/FSE-16.

[19]  Xuxian Jiang,et al.  Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution , 2008, NDSS.

[20]  Christopher Krügel,et al.  Inspector Gadget: Automated Extraction of Proprietary Gadgets from Malware Binaries , 2010, 2010 IEEE Symposium on Security and Privacy.

[21]  Thomas W. Reps,et al.  DIVINE: DIscovering Variables IN Executables , 2007, VMCAI.

[22]  Dawn Xiaodong Song,et al.  Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering , 2009, CCS.