Structure-driven optimizations for amorphous data-parallel programs

Irregular algorithms are organized around pointer-based data structures such as graphs and trees, and they are ubiquitous in applications. Recent work by the Galois project has provided a systematic approach for parallelizing irregular applications based on the idea of optimistic or speculative execution of programs. However, the overhead of optimistic parallel execution can be substantial. In this paper, we show that many irregular algorithms have structure that can be exploited and present three key optimizations that take advantage of algorithmic structure to reduce speculative overheads. We describe the implementation of these optimizations in the Galois system and present experimental results to demonstrate their benefits. To the best of our knowledge, this is the first system to exploit algorithmic structure to optimize the execution of irregular programs.

[1]  David Eppstein,et al.  Spanning Trees and Spanners , 2000, Handbook of Computational Geometry.

[2]  Antony L. Hosking,et al.  Nested transactional memory: Model and architecture sketches , 2006, Sci. Comput. Program..

[3]  Laurie Hendren,et al.  Soot: a Java bytecode optimization framework , 2010, CASCON.

[4]  Leonidas J. Guibas,et al.  Randomized incremental construction of Delaunay and Voronoi diagrams , 1990, Algorithmica.

[5]  Timothy G. Mattson,et al.  Patterns for parallel programming , 2004 .

[6]  Riccardo Zecchina,et al.  Survey propagation: An algorithm for satisfiability , 2002, Random Struct. Algorithms.

[7]  CONSTANTINE D. POLYCHRONOPOULOS,et al.  Guided Self-Scheduling: A Practical Scheduling Scheme for Parallel Supercomputers , 1987, IEEE Transactions on Computers.

[8]  Swarat Chaudhuri,et al.  Parallel programming with object assemblies , 2009, OOPSLA 2009.

[9]  Bratin Saha,et al.  McRT-STM: a high performance software transactional memory system for a multi-core runtime , 2006, PPoPP '06.

[10]  David Gay,et al.  Autolocker: synchronization inference for atomic sections , 2006, POPL '06.

[11]  Sumit Gulwani,et al.  Inferring locks for atomic sections , 2008, PLDI '08.

[12]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[13]  Keshav Pingali,et al.  How much parallelism is there in irregular applications? , 2009, PPoPP '09.

[14]  Keshav Pingali,et al.  Optimistic parallelism requires abstractions , 2009, CACM.

[15]  Clifford Stein,et al.  Introduction to Algorithms, 2nd edition. , 2001 .

[16]  Keshav Pingali,et al.  Lonestar: A suite of parallel irregular programs , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[17]  Guy E. Blelloch,et al.  Engineering a compact parallel delaunay algorithm in 3D , 2006, SCG '06.

[18]  P. Feautrier Some Eecient Solutions to the Aane Scheduling Problem Part Ii Multidimensional Time , 1992 .

[19]  Andrey N. Chernikov,et al.  Three-dimensional delaunay refinement for multi-core processors , 2008, ICS '08.

[20]  George Karypis,et al.  Multilevel k-way Partitioning Scheme for Irregular Graphs , 1998, J. Parallel Distributed Comput..

[21]  Michael R. Clarkson,et al.  Polyglot: An Extensible Compiler Framework for Java , 2003, CC.

[22]  Keshav Pingali,et al.  Optimistic parallelism benefits from data partitioning , 2008, ASPLOS.