POWER: Parallel Optimizations With Executable Rewriting

The hardware industry’s rapid development of multicore and many core hardware has outpaced the software industry’s transition from sequential to parallel programs. Most applications are still sequential, and many cores on parallel machines remain unused. We propose a tool that uses data-dependence profiling and binary rewriting to parallelize executables without access to source code. Our technique uses Bernstein’s conditions to identify independent sets of basic blocks that can be executed in parallel, introducing a level of granularity between fine-grained instruction level and coarsegrained task level parallelism. We analyze dynamically generated control and data dependence graphs to find independent sets of basic blocks which can be parallelized. We then propose to parallelize these candidates using binary rewriting techniques. Our technique aims to demonstrate the parallelism that remains in serial application by exposing concrete opportunities for paral-

[1]  Brett D. Fleisch,et al.  Utilizing Binary Rewriting for Improving End-Host Security , 2007, IEEE Transactions on Parallel and Distributed Systems.

[2]  Koen De Bosschere,et al.  Automated reduction of the memory footprint of the Linux kernel , 2007, TECS.

[3]  Alec Wolman,et al.  Instrumentation and optimization of Win32/intel executables using Etch , 1997 .

[4]  C. Luk,et al.  Prospector : A Dynamic Data-Dependence Profiler To Help Parallel Programming , 2010 .

[5]  David A. Padua,et al.  Dynamic Dependence Analysis: A Novel Method for Data Depndence Evaluation , 1992, LCPC.

[6]  Monica S. Lam,et al.  Maximizing parallelism and minimizing synchronization with affine transforms , 1997, POPL '97.

[7]  Matteo Frigo,et al.  The implementation of the Cilk-5 multithreaded language , 1998, PLDI.

[8]  Gail E. Kaiser,et al.  COMPASS: A Community-driven Parallelization Advisor for Sequential Software , 2009, 2009 ICSE Workshop on Multicore Software Engineering.

[9]  Kunle Olukotun,et al.  The Jrpm system for dynamically parallelizing Java programs , 2003, ISCA '03.

[10]  Michael Franz,et al.  Dynamic parallelization and mapping of binary executables on hierarchical platforms , 2006, CF '06.

[11]  Vivek Sarkar,et al.  X10: concurrent programming for modern architectures , 2007, PPOPP.

[12]  Venkatesan T. Chakaravarthy New results on the computability and complexity of points--to analysis , 2003, POPL '03.

[13]  Ramakrishnan Srikant,et al.  Fast Algorithms for Mining Association Rules in Large Databases , 1994, VLDB.

[14]  Michael F. P. O'Boyle,et al.  Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.

[15]  Ron Cytron,et al.  Interprocedural dependence analysis and parallelization , 1986, SIGP.

[16]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[17]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.