Towards Automatic Inference of Kernel Object Semantics from Binary Code

This paper presents Argos, the first system that can automatically uncover the semantics of kernel objects directly from a kernel binary. Based on the principle of data use reveals data semantics, it starts from the execution of system calls i.e., the user level application interface and exported kernel APIs i.e., the kernel module development interface, and automatically tracks how an instruction accesses the kernel object and assigns a bit-vector for each observed kernel object. This bit-vector encodes which system call accesses the object and how the object is accessed e.g., read, write, create, destroy, from which we derive the meaning of the kernel object based on a set of rules developed according to the general understanding of OS kernels. The experimental results with Linux kernels show that Argos is able to recognize the semantics of kernel objects of our interest, and can even directly pinpoint the important kernel data structures such as the process descriptor and memory descriptor across different kernels. We have applied Argos to recognize internal kernel functions by using the kernel objects we inferred, and we demonstrate that with Argos we can build a more precise kernel event tracking system by hooking these internal functions.

[1]  Herbert Bos,et al.  Howard: A Dynamic Excavator for Reverse Engineering Data Structures , 2011, NDSS.

[2]  Stephen McCamant,et al.  Differential Slicing: Identifying Causal Execution Differences for Security Applications , 2011, 2011 IEEE Symposium on Security and Privacy.

[3]  Samuel T. King,et al.  Digging for Data Structures , 2008, OSDI.

[4]  Xiangyu Zhang,et al.  Automatic Reverse Engineering of Data Structures from Binary Execution , 2010, NDSS.

[5]  Tal Garfinkel,et al.  A Virtual Machine Introspection Based Architecture for Intrusion Detection , 2003, NDSS.

[6]  Xiangyu Zhang,et al.  SPIDER: stealthy binary program instrumentation and debugging via hardware virtualization , 2013, ACSAC.

[7]  Frank Tip,et al.  Aggregate structure identification and its application to program analysis , 1999, POPL '99.

[8]  Zhenkai Liang,et al.  Identifying and Analyzing Pointer Misuses for Sophisticated Memory-corruption Exploit Diagnosis , 2012, NDSS.

[9]  Junyuan Zeng,et al.  PEMU: A Pin Highly Compatible Out-of-VM Dynamic Binary Instrumentation Framework , 2015, VEE.

[10]  Robin Milner,et al.  Principal type-schemes for functional programs , 1982, POPL '82.

[11]  Dawn Xiaodong Song,et al.  Dispatcher: enabling active botnet infiltration using automatic protocol reverse-engineering , 2009, CCS.

[12]  Xuxian Jiang,et al.  Automatic Protocol Format Reverse Engineering through Context-Aware Monitored Execution , 2008, NDSS.

[13]  Stephen McCamant,et al.  Dynamic inference of abstract types , 2006, ISSTA '06.

[14]  James Newsome,et al.  Dynamic Taint Analysis for Automatic Detection, Analysis, and SignatureGeneration of Exploits on Commodity Software , 2005, NDSS.

[15]  Helen J. Wang,et al.  Tupni: automatic reverse engineering of input formats , 2008, CCS.

[16]  Yangchun Fu,et al.  Space Traveling across VM: Automatically Bridging the Semantic Gap in Virtual Machine Introspection via Online Kernel Data Redirection , 2012, 2012 IEEE Symposium on Security and Privacy.

[17]  Thomas W. Reps,et al.  Analyzing Memory Accesses in x86 Executables , 2004, CC.

[18]  David Brumley,et al.  TIE: Principled Reverse Engineering of Types in Binary Programs , 2011, NDSS.

[19]  Christopher Krügel,et al.  Automatic Network Protocol Analysis , 2008, NDSS.

[20]  Brian Hay,et al.  Forensics examination of volatile system data using virtual introspection , 2008, OPSR.

[21]  Robert O'Callahan,et al.  Lackwit: A Program Understanding Tool Based on Type Inference , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[22]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.

[23]  Andrea C. Arpaci-Dusseau,et al.  Antfarm: Tracking Processes in a Virtual Machine Environment , 2006, USENIX Annual Technical Conference, General Track.

[24]  Polyglot : Automatic Extraction of Protocol Format using Dynamic Binary Analysis , 2007 .

[25]  Zhi Wang,et al.  ReFormat: Automatic Reverse Engineering of Encrypted Messages , 2009, ESORICS.

[26]  Thomas W. Reps,et al.  Improved Memory-Access Analysis for x86 Executables , 2008, CC.