Software behavior oriented parallelization

Many sequential applications are difficult to parallelize because of unpredictable control flow, indirect data access, and input-dependent parallelism. These difficulties led us to build a software system for behavior oriented parallelization (BOP), which allows a program to be parallelized based on partial information about program behavior, for example, a user reading just part of the source code, or a profiling tool examining merely one or few executions. The basis of BOP is programmable software speculation, where a user or an analysis tool marks possibly parallel regions in the code, and the run-time system executes these regions speculatively. It is imperative to protect the entire address space during speculation. The main goal of the paper is to demonstrate that the general protection can be made cost effective by three novel techniques: programmable speculation, critical-path minimization, and value-based correctness checking. On a recently acquired multi-core, multi-processor PC, the BOP system reduced the end-to-end execution time by integer factors for a Lisp interpreter, a data compressor, a language parser, and a scientific library, with no change to the underlying hardware or operating system.

[1]  Ken Kennedy,et al.  Improving cache performance in dynamic applications through data and computation reorganization at run time , 1999, PLDI '99.

[2]  Peter J. Keleher,et al.  A Protocol-Centric Approach to on-the-Fly Race Detection , 2000, IEEE Trans. Parallel Distributed Syst..

[3]  Robert Wahbe,et al.  Practical data breakpoints: design and implementation , 1993, PLDI '93.

[4]  Hans-Juergen Boehm,et al.  HP Laboratories , 2006 .

[5]  Robert H. Halstead,et al.  MULTILISP: a language for concurrent symbolic computation , 1985, TOPL.

[6]  John M. Mellor-Crummey,et al.  Compile-time support for efficient data race detection in shared-memory parallel programs , 1993, PADD '93.

[7]  Monica S. Lam,et al.  Interprocedural parallelization analysis in SUIF , 2005, TOPL.

[8]  Martin Hirzel,et al.  Dynamic hot data stream prefetching for general-purpose programs , 2002, PLDI '02.

[9]  Manish Gupta,et al.  Techniques for Speculative Run-Time Parallelization of Loops , 1998, Proceedings of the IEEE/ACM SC98 Conference.

[10]  Yunheung Paek,et al.  Parallel Programming with Polaris , 1996, Computer.

[11]  Markus Mock,et al.  A retrospective on: "an evaluation of staged run-time optimizations in DyC" , 2004, SIGP.

[12]  Maurice Herlihy,et al.  Transactional Memory: Architectural Support For Lock-free Data Structures , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[13]  Antonia Zhai,et al.  The STAMPede approach to thread-level speculation , 2005, TOCS.

[14]  Chen Ding,et al.  Parallelization of Utility Programs Based on Behavior Phase Analysis , 2005, LCPC.

[15]  Luis Ceze,et al.  Implicit parallelism with ordered transactions , 2007, PPoPP.

[16]  Katherine A. Yelick,et al.  Communication optimizations for fine-grained UPC applications , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).

[17]  Kai Li,et al.  Shared virtual memory on loosely coupled multiprocessors , 1986 .

[18]  Clemens Grelck,et al.  Sac - From High-Level Programming with Arrays to Efficient Parallel Execution , 2003, Parallel Process. Lett..

[19]  Suresh Jagannathan,et al.  Safe futures for Java , 2005, OOPSLA '05.

[20]  Monica S. Lam,et al.  The design, implementation, and evaluation of Jade , 1998, TOPL.

[21]  Lawrence Rauchwerger,et al.  The R-LRPD test: speculative parallelization of partially parallel loops , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[22]  Diego R. Llanos Ferraris,et al.  Design space exploration of a software speculative parallelization scheme , 2005, IEEE Transactions on Parallel and Distributed Systems.

[23]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[24]  Mikko H. Lipasti,et al.  Correctly implementing value prediction in microprocessors that support multithreading or multiprocessing , 2001, MICRO.

[25]  Garth A. Gibson,et al.  Automatic I/O hint generation through speculative execution , 1999, OSDI '99.

[26]  Lawrence Rauchwerger,et al.  The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization , 1995, PLDI '95.

[27]  Matthew Arnold,et al.  A framework for reducing the cost of instrumented code , 2001, PLDI '01.

[28]  Gurindar S. Sohi,et al.  Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[29]  Chen Ding,et al.  Characterizing Phases in Service-Oriented Applications , 2004 .

[30]  Ken Kennedy,et al.  Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .

[31]  Chen Ding,et al.  Locality phase prediction , 2004, ASPLOS XI.

[32]  Maurice Herlihy,et al.  Software transactional memory for dynamic-sized data structures , 2003, PODC '03.

[33]  Robert W. Numrich,et al.  Co-array Fortran for parallel programming , 1998, FORF.

[34]  Ron Cytron,et al.  Doacross: Beyond Vectorization for Multiprocessors , 1986, ICPP.

[35]  Milind Girkar,et al.  On the performance potential of different types of speculative thread-level parallelism: The DL version of this paper includes corrections that were not made available in the printed proceedings , 2006, ICS '06.

[36]  Arthur J. Bernstein,et al.  Analysis of Programs for Parallel Processing , 1966, IEEE Trans. Electron. Comput..