Static Analysis for Checking Data Format Compatibility of Programs

Large software systems are developed by composing multiple programs. If the programs manip-ulate and exchange complex data, such as network packets or files, it is essential to establish that they follow compatible data formats. Most of the complexity of data formats is associated with the headers. In this paper, we address compatibility of programs operating over headers of network packets, files, images, etc. As format specifications are rarely available, we infer the format associated with headers by a program as a set of guarded layouts. In terms of these formats, we define and check compatibility of (a) producer-consumer programs and (b) different versions of producer (or consumer) programs. A compatible producer-consumer pair is free of type mismatches and logical incompatibilities such as the consumer rejecting valid outputs gen-erated by the producer. A backward compatible producer (resp. consumer) is guaranteed to be compatible with consumers (resp. producers) that were compatible with its older version. With our prototype tool, we identified 5 known bugs and 1 potential bug in (a) sender-receiver modules of Linux network drivers of 3 vendors and (b) different versions of a TIFF image library.

[1]  Leonardo Mariani,et al.  Compatibility and Regression Testing of COTS-Component-Based Software , 2007, 29th International Conference on Software Engineering (ICSE'07).

[2]  Godmar Back,et al.  DataScript - A Specification and Scripting Language for Binary Data , 2002, GPCE.

[3]  Satish Chandra,et al.  Packet types: abstract specification of network protocol messages , 2000 .

[4]  Zhenkai Liang,et al.  Towards Automatic Discovery of Deviations in Binary Implementations with Applications to Error Detection and Fingerprint Generation , 2007, USENIX Security Symposium.

[5]  Satish Chandra,et al.  Physical type checking for C , 1999, PASTE '99.

[6]  Rajeev Alur,et al.  Representation dependence testing using program inversion , 2010, FSE '10.

[7]  Rupak Majumdar,et al.  State of the Union: Type Inference Via Craig Interpolation , 2007, TACAS.

[8]  Antoine Miné,et al.  The octagon abstract domain , 2001, High. Order Symb. Comput..

[9]  Thomas W. Reps,et al.  Extracting Output Formats from Executables , 2006, 2006 13th Working Conference on Reverse Engineering.

[10]  Huang Bo Context Sensitive Interprocedural Pointer Analysis , 2000 .

[11]  H.,et al.  Behavioral Subtyping Using Invariants and ConstraintsBarbara , 1999 .

[12]  Sagar Chaki,et al.  Verification of evolving software via component substitutability analysis , 2008, Formal Methods Syst. Des..

[13]  Laurie J. Hendren,et al.  Context-sensitive interprocedural points-to analysis in the presence of function pointers , 1994, PLDI '94.

[14]  David Walker,et al.  The PADS project: an overview , 2011, ICDT '11.

[15]  David Walker,et al.  LearnPADS + + : Incremental Inference of Ad Hoc Data Formats , 2012, PADL.

[16]  Thomas W. Reps,et al.  Checking conformance of a producer and a consumer , 2011, ESEC/FSE '11.

[17]  Thomas A. Henzinger,et al.  Interface automata , 2001, ESEC/FSE-9.

[18]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[19]  Sagar Chaki,et al.  Dynamic Component Substitutability Analysis , 2005, FM.

[20]  Stephen McCamant,et al.  Early Identification of Incompatibilities in Multi-component Upgrades , 2004, ECOOP.

[21]  Tevfik Bultan,et al.  Analyzing singularity channel contracts , 2009, ISSTA.

[22]  Satish Chandra,et al.  Packet Types: Abstract specifications of network protocol messages , 2000, SIGCOMM.

[23]  Satish Chandra,et al.  Dependent Types for Program Understanding , 2005, TACAS.

[24]  Frank Tip,et al.  Aggregate structure identification and its application to program analysis , 1999, POPL '99.

[25]  James R. Larus,et al.  Language support for fast and reliable message-based communication in singularity OS , 2006, EuroSys.

[26]  Ranjit Jhala,et al.  Low-level liquid types , 2010, POPL '10.

[27]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.

[28]  Hongwei Xi,et al.  Imperative programming with dependent types , 2000, Proceedings Fifteenth Annual IEEE Symposium on Logic in Computer Science (Cat. No.99CB36332).

[29]  George C. Necula,et al.  CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs , 2002, CC.

[30]  Raghavan Komondoor,et al.  Recovering Data Models via Guarded Dependences , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).