Identifying Inter-task Communication in Shared Memory Programming Models

Modern computers often use multi-core architectures, covering clusters of homogeneous cores for high performance computing, to heterogeneous architectures typically found in embedded systems. To efficiently program such architectures, it is important to be able to partition and map programs onto the cores of the architecture. We believe that communication patterns need to become explicit in the source code to make it easier to analyze and partition parallel programs. Extraction of these patterns are difficult to automate due to limitations in compiler techniques when determining the effects of pointers. In this paper, we propose an OpenMP extension which allows programmers to explicitly declare the pointer based data-sharing between coarse-grain program parts. We present a dependency directive, expressing the input and output relation between program parts and pointers to shared data, as well as a set of runtime operations which are necessary to enforce declarations made by the programmer. The cost and scalability of the runtime operations are evaluated using micro-benchmarks and a benchmark from the NAS parallel benchmark suite. The measurements show that the overhead of the runtime operations is small. In fact, no performance degradation is found when using the runtime operations in the benchmark from the NAS parallel benchmark suite.

[1]  Niraj K. Jha,et al.  Task graph extraction for embedded system synthesis , 2003, 16th International Conference on VLSI Design, 2003. Proceedings..

[2]  Robert P. Dick,et al.  Automatic run-time extraction of communication graphs from multithreaded applications , 2006, Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS '06).

[3]  David Lorge Parnas,et al.  Concurrent control with “readers” and “writers” , 1971, CACM.

[4]  Alejandro Duran,et al.  Extending the OpenMP Tasking Model to Allow Dependent Tasks , 2008, IWOMP.

[5]  Jack Dongarra,et al.  MPI: The Complete Reference , 1996 .

[6]  Frances E. Allen,et al.  Control-flow analysis , 2022 .

[7]  Rizos Sakellariou,et al.  Compiler Synthesis of Task Graphs for Parallel Program Performance Prediction , 2000, LCPC.

[8]  Mats Brorsson,et al.  Programming Effort vs. Performance with a Hybrid Programming Model for Distributed Memory Parallel Architectures , 1999, Euro-Par.

[9]  Mats Brorsson,et al.  A Comparative Characterization of Communication Patterns in Applications Using MPI and Shared Memory on an IBM SP2 , 1998, CANPC.

[10]  Alejandro Duran,et al.  A Proposal for Task Parallelism in OpenMP , 2007, IWOMP.

[11]  Bronis R. de Supinski,et al.  OpenMP in a New Era of Parallelism, 4th International Workshop, IWOMP 2008, West Lafayette, IN, USA, May 12-14, 2008, Proceedings , 2008, IWOMP.

[12]  Katherine A. Yelick,et al.  Type Systems for Distributed Data Sharing , 2003, SAS.

[13]  Oliver Sinnen,et al.  Task Scheduling for Parallel Systems , 2007, Wiley series on parallel and distributed computing.

[14]  Michael Hind,et al.  Pointer analysis: haven't we solved this problem yet? , 2001, PASTE '01.

[15]  Sarita V. Adve,et al.  Shared Memory Consistency Models: A Tutorial , 1996, Computer.

[16]  Monica S. Lam,et al.  The design, implementation, and evaluation of Jade , 1998, TOPL.

[17]  Petru Eles,et al.  System-Level Design Techniques for Energy-Efficient Embedded Systems , 2003, Springer US.

[18]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[19]  J. Mark Bull,et al.  A microbenchmark suite for OpenMP 2.0 , 2001, CARN.

[20]  G. Ramalingam,et al.  The undecidability of aliasing , 1994, TOPL.

[21]  Barbara Chapman A Practical Programming Model for the Multi-Core Era, 3rd International Workshop on OpenMP, IWOMP 2007, Beijing, China, June 3-7, 2007, Proceedings , 2008, IWOMP.

[22]  Laurie J. Hendren,et al.  Context-sensitive interprocedural points-to analysis in the presence of function pointers , 1994, PLDI '94.

[23]  Paul E. McKenney,et al.  READ-COPY UPDATE: USING EXECUTION HISTORY TO SOLVE CONCURRENCY PROBLEMS , 2002 .

[24]  Michel Dubois,et al.  Memory access buffering in multiprocessors , 1998, ISCA '98.

[25]  H. Rice Classes of recursively enumerable sets and their decision problems , 1953 .

[26]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[27]  Anoop Gupta,et al.  Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.

[28]  Oliver Sinnen,et al.  Task Scheduling for Parallel Systems (Wiley Series on Parallel and Distributed Computing) , 2007 .