Program specialization and verification using file format specifications

Programs that process data that reside in files are widely used in varied domains, such as banking, healthcare, and web-traffic analysis. Precise static analysis of these programs in the context of software transformation and verification tasks is a challenging problem. Our key insight is that static analysis of file-processing programs can be made more useful if knowledge of the input file formats of these programs is made available to the analysis. We instantiate this idea to solve two practical problems - specializing the code of a program to a given “restricted” input file format, and verifying if a program “conforms” to a given input file format. We then discuss an implementation of our approach, and also empirical results on a set of real and realistic programs. The results are very encouraging in the terms of both scalability as well as precision of the approach.

[1]  Shrawan Kumar,et al.  Static program analysis of large embedded code base: an experience , 2011, ISEC.

[2]  Mark Harman,et al.  Amorphous program slicing , 1997, Proceedings Fifth International Workshop on Program Comprehension. IWPC'97.

[3]  Raghavan Komondoor,et al.  Precision vs. scalability: Context sensitive analysis with prefix approximation , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[4]  Insup Lee,et al.  Abstract slicing: a new approach to program slicing based on abstract interpretation and model checking , 2005, Fifth IEEE International Workshop on Source Code Analysis and Manipulation (SCAM'05).

[5]  Peter Sestoft,et al.  Partial evaluation and automatic program generation , 1993, Prentice Hall international series in computer science.

[6]  Saurabh Sinha,et al.  Parametric Process Model Inference , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[7]  Frank Tip,et al.  Parametric program slicing , 1995, POPL '95.

[8]  R. E. Kurt Stirewalt,et al.  Understanding interleaved code , 2004, Automated Software Engineering.

[9]  Antoine Miné,et al.  The octagon abstract domain , 2001, High. Order Symb. Comput..

[10]  Raghavan Komondoor,et al.  Recovering Data Models via Guarded Dependences , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[11]  Thomas W. Reps,et al.  Checking conformance of a producer and a consumer , 2011, ESEC/FSE '11.

[12]  Patrick Cousot,et al.  Abstract interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints , 1977, POPL.

[13]  Adam Kiezun,et al.  Grammar-based whitebox fuzzing , 2008, PLDI '08.

[14]  James C. King,et al.  Symbolic execution and program testing , 1976, CACM.

[15]  Aniello Cimitile,et al.  Conditioned program slicing , 1998, Inf. Softw. Technol..

[16]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[17]  Robert E. Strom,et al.  Typestate: A programming language concept for enhancing software reliability , 1986, IEEE Transactions on Software Engineering.

[18]  Sandrine Blazy,et al.  SFAC, a tool for program comprehension by specialization , 1994, Proceedings 1994 IEEE 3rd Workshop on Program Comprehension- WPC '94.

[19]  Sorin Lerner,et al.  ESP: path-sensitive program verification in polynomial time , 2002, PLDI '02.

[20]  Siau-Cheng Khoo,et al.  Parameterized partial evaluation , 1993, TOPL.

[21]  Helen J. Wang,et al.  Tupni: automatic reverse engineering of input formats , 2008, CCS.

[22]  Aditya Kanade,et al.  Static Analysis for Checking Data Format Compatibility of Programs , 2012, FSTTCS.

[23]  Frank Pfenning,et al.  Dependent types in practical programming , 1999, POPL '99.

[24]  Zhenkai Liang,et al.  Polyglot: automatic extraction of protocol message format using dynamic binary analysis , 2007, CCS '07.

[25]  Mark Harman,et al.  ConSIT: a conditioned program slicer , 2000, Proceedings 2000 International Conference on Software Maintenance.

[26]  Stephen McCamant,et al.  Loop-extended symbolic execution on binary programs , 2009, ISSTA.

[27]  Carlo Ghezzi,et al.  Software Specialization Via Symbolic Execution , 1991, IEEE Trans. Software Eng..

[28]  Mark Harman,et al.  Backward conditioning: a new program specialisation technique and its application to program comprehension , 2001, Proceedings 9th International Workshop on Program Comprehension. IWPC 2001.

[29]  David Walker,et al.  The PADS project: an overview , 2011, ICDT '11.

[30]  Raghavan Komondoor,et al.  Static Analysis of File-Processing Programs using File Format Specifications , 2015, ArXiv.

[31]  Rupak Majumdar,et al.  Joining dataflow with predicates , 2005, ESEC/FSE-13.

[32]  John Launchbury,et al.  Projection factorisations in partial evaluation , 1991 .