A Scalable Mixed-Level Approach to Dynamic Analysis of C and C++ Programs

This thesis addresses the difficult task of constructing robust and scalable dynamic program analysis tools for programs written in memory-unsafe languages such as C and C++, especially those that are interested in observing the contents of data structures at run time. In this thesis, I first introduce my novel mixed-level approach to dynamic analysis, which combines the advantages of both sourceand binarybased approaches. Second, I present a tool framework that embodies the mixed-level approach. This framework provides memory safety guarantees, allows tools built upon it to access rich sourceand binary-level information simultaneously at run time, and enables tools to scale to large, real-world C and C++ programs on the order of millions of lines of code. Third, I present two dynamic analysis tools built upon my framework — one for performing value profiling and the other for performing dynamic inference of abstract types — and describe how they far surpass previous analyses in terms of scalability, robustness, and applicability. Lastly, I present several case studies demonstrating how these tools aid both humans and automated tools in several program analysis tasks: improving human understanding of unfamiliar code, invariant detection, and data structure repair. Thesis Supervisor: Michael D. Ernst Title: Associate Professor

[1]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[2]  Hiralal Agrawal,et al.  Towards automatic debugging of computer programs , 1992 .

[3]  George C. Necula,et al.  CIL: Intermediate Language and Tools for Analysis and Transformation of C Programs , 2002, CC.

[4]  Victor Luchangco,et al.  Object-oriented units of measurement , 2004, OOPSLA.

[5]  Rainer Koschke,et al.  Locating Features in Source Code , 2003, IEEE Trans. Software Eng..

[6]  Stephen McCamant,et al.  Dynamic inference of abstract types , 2006, ISSTA '06.

[7]  Nicholas Nethercote,et al.  Valgrind: A Program Supervision Framework , 2003, RV@CAV.

[8]  Michael D. Ernst,et al.  Efficient incremental algorithms for dynamic detection of likely invariants , 2004, SIGSOFT '04/FSE-12.

[9]  Adrian Kuhn,et al.  Applying Semantic Analysis to Feature Execution Traces , 2005 .

[10]  Ewan D. Tempero,et al.  Reuse of debuggers for visualization of reuse , 1999, SSR '99.

[11]  William G. Griswold,et al.  Quickly detecting relevant program invariants , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[12]  Wei Xu,et al.  An efficient and backwards-compatible transformation to ensure memory safety of C programs , 2004, SIGSOFT '04/FSE-12.

[13]  George C. Necula,et al.  CCured: type-safe retrofitting of legacy code , 2002, POPL '02.

[14]  Frank Tip,et al.  A survey of program slicing techniques , 1994, J. Program. Lang..

[15]  Todd M. Austin,et al.  Efficient detection of all pointer and array access errors , 1994, PLDI '94.

[16]  Brad Calder,et al.  Value profiling , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[17]  Evelyn Duesterwald,et al.  Design and implementation of a dynamic optimization framework for windows , 2000 .

[18]  Michael I. Jordan,et al.  Bug isolation via remote program sampling , 2003, PLDI.

[19]  Nicholas Nethercote,et al.  Using Valgrind to Detect Undefined Value Errors with Bit-Precision , 2005, USENIX Annual Technical Conference, General Track.

[20]  Brad Calder,et al.  Value Profiling and Optimization , 1999, J. Instr. Level Parallelism.

[21]  Olin Shivers,et al.  Demand-Driven Type Inference with Subgoal Pruning: Trading Precision for Scalability , 2004, ECOOP.

[22]  Mark Weiser,et al.  Program Slicing , 1981, IEEE Transactions on Software Engineering.

[23]  Benjamin Morse,et al.  A C/C++ front end for the Daikon dynamic invariant detection system , 2002 .

[24]  Robert O'Callahan,et al.  Generalized aliasing as a basis for program analysis tools , 2001 .

[25]  Robert O'Callahan,et al.  Lackwit: A Program Understanding Tool Based on Type Inference , 1997, Proceedings of the (19th) International Conference on Software Engineering.

[26]  Bjarne Steensgaard,et al.  Points-to analysis in almost linear time , 1996, POPL '96.

[27]  Richard M. Stallman,et al.  Debugging with GDB: The GNU Source-Level Debugger , 1999 .

[28]  Amer Diwan,et al.  Discovering Algebraic Specifications from Java Classes , 2003, ECOOP.

[29]  Stéphane Ducasse,et al.  Correlating features and code using a compact two-sided trace analysis approach , 2005, Ninth European Conference on Software Maintenance and Reengineering.

[30]  Robin Milner,et al.  A Theory of Type Polymorphism in Programming , 1978, J. Comput. Syst. Sci..

[31]  David B. MacQueen,et al.  The Definition of Standard ML (Revised) , 1997 .

[32]  Stéphane Ducasse,et al.  Analyzing feature traces to incorporate the semantics of change in software evolution analysis , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[33]  Markus Mock,et al.  Dynamic points-to sets: a comparison with static analyses and potential applications in program understanding and optimization , 2001, PASTE '01.

[34]  Margaret M. Burnett,et al.  Adding Apples and Oranges , 2002, PADL.

[35]  Andrew John Kennedy,et al.  Programming languages and dimensions , 1995 .

[36]  James R. Larus,et al.  Using Tracing and Dynamic Slicing to Tune Compilers , 1993 .

[37]  G. A. Venkatesh,et al.  Experimental results from dynamic slicing of C programs , 1995, TOPL.

[38]  James R. Larus,et al.  EEL: machine-independent executable editing , 1995, PLDI '95.

[39]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[40]  Swapna S. Gokhale,et al.  Quantifying the closeness between program components and features , 2000, J. Syst. Softw..

[41]  H. Cleve,et al.  Locating causes of program failures , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[42]  Ole Agesen The Cartesian Product Algorithm: Simple and Precise Type Inference Of Parametric Polymorphism , 1995, ECOOP.

[43]  Alec Wolman,et al.  Instrumentation and optimization of Win32/intel executables using Etch , 1997 .

[44]  Monica S. Lam,et al.  Cloning-based context-sensitive pointer alias analysis using binary decision diagrams , 2004, PLDI '04.

[45]  Joseph Robert Horgan,et al.  Dynamic program slicing , 1990, PLDI '90.

[46]  James R. Larus,et al.  The use of program profiling for software maintenance with applications to the year 2000 problem , 1997, ESEC '97/FSE-5.

[47]  Norman Wilde,et al.  Software reconnaissance: Mapping program features to code , 1995, J. Softw. Maintenance Res. Pract..

[48]  Alan Eustace,et al.  ATOM - A System for Building Customized Program Analysis Tools , 1994, PLDI.

[49]  Jakob Rehof,et al.  Estimating the Impact of Scalable Pointer Analysis on Optimization , 2001, SAS.

[50]  David R. Hanson,et al.  DUEL - A Very High-Level Debugging Language , 1993, USENIX Winter.

[51]  M. Lam,et al.  Tracking down software bugs using automatic anomaly detection , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[52]  Matthias Felleisen,et al.  Catching bugs in the web of program invariants , 1996, PLDI '96.

[53]  Stéphane Ducasse,et al.  Dynamic Type Inference to Support Object-Oriented Reenginerring in Smalltalk , 1998, ECOOP Workshops.

[54]  Michael D. Ernst Static and dynamic analysis: synergy and duality , 2003 .

[55]  Stephen McCamant,et al.  Inference and enforcement of data structure consistency specifications , 2006, ISSTA '06.