Simultaneous Inspection: Hiding the Overhead of Inspector-Executor Style Dynamic Parallelization

A common approach for dynamic parallelization of loops at runtime is the inspector-executor pattern. The inspector first runs the loop without any (side) effects to analyze whether there are data dependences that would prevent parallel execution. Only if no such dependences are found, does the executor phase actually run the loop iterations in parallel. In previous works, the overhead of the inspection must either be amortized by the parallel execution or is completely wasted if the loop turns out to be non-parallelizable.

[1]  Scott A. Mahlke,et al.  Parallelizing sequential applications on commodity hardware using a low-cost software transactional memory , 2009, PLDI '09.

[2]  Michael Philippsen,et al.  Double Inspection for Run-Time Loop Parallelization , 2011, LCPC.

[3]  Michael F. P. O'Boyle,et al.  Towards a holistic approach to auto-parallelization: integrating profile-driven parallelism detection and machine-learning based mapping , 2009, PLDI '09.

[4]  Saturnino Garcia,et al.  Kremlin: rethinking and rebooting gprof for the multicore age , 2011, PLDI '11.

[5]  Per Larsen,et al.  Parallelizing more Loops with Compiler Guided Refactoring , 2012, 2012 41st International Conference on Parallel Processing.

[6]  Seon Wook Kim,et al.  Runtime parallelization of legacy code on a transactional memory system , 2011, HiPEAC.

[7]  Kunle Olukotun,et al.  The Jrpm system for dynamically parallelizing Java programs , 2003, ISCA '03.

[8]  Chen Yang,et al.  A cost-driven compilation framework for speculative parallelization of sequential programs , 2004, PLDI '04.

[9]  Gu-Yeon Wei,et al.  HELIX: automatic parallelization of irregular programs for chip multiprocessing , 2012, CGO '12.

[10]  Manish Gupta,et al.  Techniques for Speculative Run-Time Parallelization of Loops , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[11]  John Zahorjan,et al.  Improving the performance of runtime parallelization , 1993, PPOPP '93.

[12]  Amer Diwan,et al.  SUIF Explorer: an interactive and interprocedural parallelizer , 1999, PPoPP '99.

[13]  Alok Choudhary,et al.  Runtime compilation techniques for data partitioning and communication schedule reuse , 1993, Supercomputing '93.

[14]  Juan Touriño,et al.  An Inspector-Executor Algorithm for Irregular Assignment Parallelization , 2004, ISPA.

[15]  Arturo González-Escribano,et al.  Robust thread-level speculation , 2011, 2011 18th International Conference on High Performance Computing.

[16]  Lawrence Rauchwerger,et al.  The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.

[17]  Arturo González-Escribano,et al.  Exclusive squashing for thread-level speculation , 2011, HPDC '11.