Optimistic parallelism requires abstractions

The problem of writing software for multicore processors is greatly simplified if we could automatically parallelize sequential programs. Although auto-parallelization has been studied for many decades, it has succeeded only in a few application areas such as dense matrix computations. In particular, auto-parallelization of irregular programs, which are organized around large, pointer-based data structures like graphs, has seemed intractable. The Galois project is taking a fresh look at autoparallelization. Rather than attempt to parallelize all programs no matter how obscurely they are written, we are designing programming abstractions that permit programmers to highlight opportunities for exploiting parallelism in sequential programs, and building a runtime system that uses these hints to execute the program in parallel. In this paper, we describe the design and implementation of a system based on these ideas. Experimental results for two real-world irregular applications, a Delaunay mesh refinement application and a graphics application that performs agglomerative clustering, demonstrate that this approach is promising.

[1]  Gary L. Miller,et al.  Sparse parallel Delaunay mesh refinement , 2007, SPAA '07.

[2]  Alok Choudhary,et al.  Runtime compilation techniques for data partitioning and communication schedule reuse , 1993, Supercomputing '93.

[3]  Bratin Saha,et al.  Open nesting in software transactional memory , 2007, PPOPP.

[4]  Vipin Kumar,et al.  Introduction to Data Mining, (First Edition) , 2005 .

[5]  K. Bala,et al.  Lightcuts: a scalable approach to illumination , 2005, SIGGRAPH 2005.

[6]  Keshav Pingali,et al.  How much parallelism is there in irregular applications? , 2009, PPoPP '09.

[7]  Martin C. Rinard,et al.  Commutativity analysis: a new analysis technique for parallelizing compilers , 1997, TOPL.

[8]  Antonia Zhai,et al.  A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[9]  L. Paul Chew,et al.  Guaranteed-quality mesh generation for curved surfaces , 1993, SCG '93.

[10]  Keshav Pingali,et al.  Scheduling strategies for optimistic parallel execution of irregular programs , 2008, SPAA '08.

[11]  L. Rauchwerger,et al.  The LRPD Test: Speculative Run-Time Parallelization of Loops with Privatization and Reduction Parallelization , 1999, IEEE Trans. Parallel Distributed Syst..

[12]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[13]  Jong-Deok Choi,et al.  Interprocedural pointer alias analysis , 1999, TOPL.

[14]  Keshav Pingali,et al.  Optimistic parallelism requires abstractions , 2007, PLDI '07.

[15]  Laurie J. Hendren,et al.  Is it a tree, a DAG, or a cyclic graph? A shape analysis for heap-directed pointers in C , 1996, POPL '96.

[16]  Keshav Pingali,et al.  Optimistic parallelism benefits from data partitioning , 2008, ASPLOS.

[17]  Josep Torrellas,et al.  Hardware for speculative run-time parallelization in distributed shared-memory multiprocessors , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[18]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.

[19]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[20]  James R. Larus,et al.  Transactional Memory (Synthesis Lectures on Computer Architecture) , 2007 .

[21]  Maurice Herlihy,et al.  Transactional boosting: a methodology for highly-concurrent transactional objects , 2008, PPoPP.