OoOJava: software out-of-order execution

Developing parallel software using current tools can be challenging. Even experts find it difficult to reason about the use of locks and often accidentally introduce race conditions and deadlocks into parallel software. OoOJava is a compiler-assisted approach that leverages developer annotations along with static analysis to provide an easy-to-use deterministic parallel programming model. OoOJava extends Java with a task annotation that instructs the compiler to consider a code block for out-of-order execution. OoOJava executes tasks as soon as their data dependences are resolved and guarantees that the execution of an annotated program preserves the exact semantics of the original sequential program. We have implemented OoOJava and achieved an average speedup of 16.6x on our ten benchmarks.

[1]  Jong-Deok Choi,et al.  Efficient flow-sensitive interprocedural computation of pointer-induced aliases and side effects , 1993, POPL '93.

[2]  Joel H. Saltz,et al.  Run-time parallelization and scheduling of loops , 1989, SPAA '89.

[3]  Chen Ding,et al.  Software behavior oriented parallelization , 2007, PLDI '07.

[4]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[5]  Yun Zhang,et al.  Decoupled software pipelining creates parallelization opportunities , 2010, CGO '10.

[6]  Rita Loogen,et al.  Comparing Parallel Functional Languages: Programming and Performance , 2003, High. Order Symb. Comput..

[7]  Eduard Ayguadé,et al.  Task Superscalar: An Out-of-Order Task Pipeline , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[8]  Emery D. Berger,et al.  Grace: safe multithreaded programming for C/C++ , 2009, OOPSLA 2009.

[9]  Alain Deutsch,et al.  Interprocedural may-alias analysis for pointers: beyond k-limiting , 1994, PLDI '94.

[10]  Nancy M. Amato,et al.  Run-time methods for parallelizing partially parallel loops , 1995, ICS '95.

[11]  Rosa M. Badia,et al.  CellSs: a Programming Model for the Cell BE Architecture , 2006, ACM/IEEE SC 2006 Conference (SC'06).

[12]  Laxmikant V. Kalé,et al.  Charisma: orchestrating migratable parallel objects , 2007, HPDC '07.

[13]  Marek Olszewski,et al.  Kendo: efficient deterministic multithreading in software , 2009, ASPLOS.

[14]  Monica S. Lam,et al.  Jade: a high-level, machine-independent language for parallel programming , 1993, Computer.

[15]  Luis Ceze,et al.  Implicit parallelism with ordered transactions , 2007, PPoPP.

[16]  Guilherme Ottoni,et al.  Automatic thread extraction with decoupled software pipelining , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[17]  Jaspal Subhlok,et al.  A new model for integrated nested task and data parallel programming , 1997, PPOPP '97.

[18]  Kechang Dai Code Parallelization for the LGDG Large-Grain Dataflow Computation , 1990, CONPAR.

[19]  Joel H. Saltz,et al.  Run-Time Parallelization and Scheduling of Loops , 1991, IEEE Trans. Computers.

[20]  Jin Zhou,et al.  Bamboo: a data-centric, object-oriented approach to many-core software , 2010, PLDI '10.

[21]  Reinhard Wilhelm,et al.  Parametric shape analysis via 3-valued logic , 1999, POPL '99.

[22]  Jeffrey Overbey,et al.  A type and effect system for deterministic parallel Java , 2009, OOPSLA 2009.

[23]  Keshav Pingali,et al.  Lonestar: A suite of parallel irregular programs , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[24]  Coniferous softwood GENERAL TERMS , 2003 .

[25]  Suresh Jagannathan,et al.  Safe futures for Java , 2005, OOPSLA '05.

[26]  Charles E. Leiserson,et al.  The JCilk Language for Multithreaded Computing , 2005 .

[27]  Gurindar S. Sohi,et al.  Serialization sets: a dynamic dependence-based parallel execution model , 2009, PPoPP '09.

[28]  Brian Demsky,et al.  OoOJava: an out-of-order approach to parallel programming , 2010 .

[29]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[30]  Kunle Olukotun,et al.  STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.

[31]  Pierre Jouvelot,et al.  The FX-87 Interpreter , 1988, Proceedings. 1988 International Conference on Computer Languages.

[32]  Keith H. Randall,et al.  Cilk: efficient multithreaded computing , 1998 .

[33]  L.A. Smith,et al.  A Parallel Java Grande Benchmark Suite , 2001, ACM/IEEE SC 2001 Conference (SC'01).

[34]  Kathryn S. McKinley,et al.  Data flow analysis for software prefetching linked data structures in Java , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[35]  Serge J. Belongie,et al.  SD-VBS: The San Diego Vision Benchmark Suite , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[36]  Brian Demsky,et al.  Disjoint Reachability Analysis Disjoint Reachability Analysis , 2010 .