Toward Automatic Data Structure Replacement for Effective Parallelization

Data structures define how values being computed are stored and accessed within programs. By recognizing what data structures are being used in an application, tools can make applications more robust by enforcing data structure consistency properties, and developers can better understand and more easily modify applications to suit the target architecture for a particular application. This paper presents the design and application of DDT, a new program analysis tool that automatically identifies data structures within an application. A binary application is instrumented to dynamically monitor how the data is stored and organized for a set of sample inputs. The instrumentation detects which functions interact with the stored data, and creates a signature for these functions using dynamic invariant detection. The invariants of these functions are then matched against a library of known data structures, providing a probable identification. That is, DDT uses program consistency properties to identify what data structures an application employs. The empirical evaluation shows that this technique is highly accurate across several different implementations of standard data structures.

[1]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[2]  René Dekker,et al.  Abstract data structure recognition , 1994, Proceedings KBSE '94. Ninth Knowledge-Based Software Engineering Conference.

[3]  Samuel T. King,et al.  Digging for Data Structures , 2008, OSDI.

[4]  Easwaran Raman,et al.  A framework for unrestricted whole-program optimization , 2006, PLDI '06.

[5]  Michael Hind,et al.  Pointer analysis: haven't we solved this problem yet? , 2001, PASTE '01.

[6]  Stephen McCamant,et al.  Inference and enforcement of data structure consistency specifications , 2006, ISSTA '06.

[7]  Lixia Liu,et al.  Perflint: A Context Sensitive Performance Advisor for C++ Programs , 2009, 2009 International Symposium on Code Generation and Optimization.

[8]  Michael J. Vilot,et al.  Standard template library , 1996 .

[9]  Reinhard Wilhelm,et al.  Parametric shape analysis via 3-valued logic , 1999, POPL '99.

[10]  Martin C. Rinard,et al.  Goal-Directed Reasoning for Specification-Based Data Structure Repair , 2006, IEEE Transactions on Software Engineering.

[11]  Viktor Kuncak,et al.  Full functional verification of linked data structures , 2008, PLDI '08.

[12]  Salwa K. Abd-El-Hafiz,et al.  Identifying Objects in Procedural Programs Using Clustering Neural Networks , 2000, Automated Software Engineering.

[13]  Laurie J. Hendren,et al.  Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C , 1996, POPL '96.

[14]  Stephen McCamant,et al.  Dynamic inference of abstract types , 2006, ISSTA '06.

[15]  Christopher Chute,et al.  The Diverse and Exploding Digital Universe , 2011 .

[16]  Alex Quilici Reverse Engineering of Legacy Systems: A Path Toward Success , 1995, 1995 17th International Conference on Software Engineering.

[17]  James Tuck,et al.  Parallelizing Mudflap Using Thread-Level Speculation on a Chip Multiprocessor , 2008 .

[18]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[19]  Easwaran Raman,et al.  Recursive data structure profiling , 2005, MSP '05.

[20]  Viktor Kuncak,et al.  Modular Pluggable Analyses for Data Structure Consistency , 2006, IEEE Transactions on Software Engineering.