Finish Accumulators : a Deterministic Reduction Construct for Dynamic Task Parallelism

Parallel reductions represent a common pattern for computing the aggregation of an associative and commutative operation, such as summation, across multiple pieces of data supplied by parallel tasks. In this paper, we introduce finish accumulators, a unified construct that supports predefined and user-defined deterministic reductions for dynamic async-finish task parallelism. Finish accumulators are designed to be integrated into terminally strict models of task parallelism as in the X10 and Habanero-Java (HJ) languages, which is more general than fully strict models of task parallelism found in Cilk and OpenMP. In contrast to lower-level reduction constructs such as atomic variables, the high-level semantics of finish accumulators allows for a wide range of implementations with different accumulation policies, e.g., eager-computation vs. lazycomputation. The best implementation can thus be selected based on a given application and the target platform that it will execute on. We have integrated finish accumulators into the Habanero-Java task parallel language, and used them in both research and teaching. In addition to their higherlevel semantics, experimental results demonstrate that our Java-based implementation of finish accumulators delivers comparable or better performance for reductions relative to Java’s atomic variables and concurrent collection libraries.

[1]  Yi Guo,et al.  Work-first and help-first scheduling policies for async-finish task parallelism , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[2]  Vivek Sarkar,et al.  Phaser accumulators: A new reduction construct for dynamic parallelism , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[3]  Vivek Sarkar,et al.  Type inference for locality analysis of distributed data structures , 2008, PPoPP.

[4]  Vivek Sarkar,et al.  X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[5]  David Holmes,et al.  Java Concurrency in Practice , 2006 .

[6]  Yi Guo,et al.  SLAW: A scalable locality-aware adaptive work-stealing scheduler , 2010, IPDPS.

[7]  Haibo Chen,et al.  Evaluating the Performance and Scalability of MapReduce Applications on X10 , 2011, APPT.

[8]  Alejandro Duran,et al.  Barcelona OpenMP Tasks Suite: A Set of Benchmarks Targeting the Exploitation of Task Parallelism in OpenMP , 2009, 2009 International Conference on Parallel Processing.

[9]  Vivek Sarkar,et al.  Habanero-Java: the new adventures of old X10 , 2011, PPPJ.

[10]  Vivek Sarkar,et al.  Phasers: a unified deadlock-free construct for collective and point-to-point synchronization , 2008, ICS '08.

[11]  Vivek Sarkar,et al.  Delegated isolation , 2011, OOPSLA '11.

[12]  Matteo Frigo,et al.  Reducers and other Cilk++ hyperobjects , 2009, SPAA '09.

[13]  Charles E. Leiserson,et al.  The Cilk++ concurrency platform , 2009, 2009 46th ACM/IEEE Design Automation Conference.