论文信息 - Single Operation Multiple Data - Data Parallelism at Subroutine Level

Single Operation Multiple Data - Data Parallelism at Subroutine Level

The parallel nature of the multi-core architectural design can only be fully exploited by concurrent applications. This status quo pushed productivity to the forefront of the language design concerns. The community is demanding for new solutions in the design, compilation, and implementation of concurrent languages, making this research area one of great importance and impact. To that extent this paper proposes the expression of data parallelism at subroutine level. The calling of a subroutine in this context spawns several execution flows, each operating on distinct partitions of the input dataset. Such computations can be expressed by simply annotating sequential subroutines with data distribution and reduction policies, delegating the management of the parallel execution to a dedicated runtime system. The paper overviews the key concepts of the model, illustrating them with some small programming examples, and describes a Java implementation built on top of the X10 [1] runtime system. A performance evaluation attests that this approach can provide good performance gains without burdening the programmer with the writing of specialized code.

Hervé Paulino | Eduardo R. B. Marques | Eduardo Marques | Hervé Paulino

[1] Katherine Yelick,et al. Introduction to UPC and Language Specification , 2000 .

[2] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.

[3] Sanjay Ghemawat,et al. MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[4] Hans P. Zima,et al. The cascade high productivity language , 2004 .

[5] Katherine A. Yelick,et al. Titanium: A High-performance Java Dialect , 1998, Concurr. Pract. Exp..

[6] James Reinders,et al. Intel® threading building blocks , 2008 .

[7] Message Passing Interface Forum. MPI: A message - passing interface standard , 1994 .

[8] Michael R. Clarkson,et al. Polyglot: An Extensible Compiler Framework for Java , 2003, CC.

[9] Douglas C. Schmidt,et al. Active object: an object behavioral pattern for concurrent programming , 1996 .

[10] C. H. Flood,et al. The Fortress Language Specification , 2007 .

[11] Sreedhar B. Kodali,et al. The Asynchronous Partitioned Global Address Space Model , 2010 .

[12] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .

[13] Hervé Paulino,et al. Di_pSystem: A Parallel Programming System for Distributed Memory Architectures , 1999, PVM/MPI.

[15] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.