Local and Global Data Distribution in the Filaments Package

It is generally agreed that using threads and shared memory provides a desirable parallel programming model. However, to achieve scalability it is often necessary to execute these programs on a distributed-memory multicomputer. The Filaments package provides an interface of threads and shared memory with an implementation on a distributed memory machine through a software distributed shared memory (DSM). This focus of this paper is on the problem of finding effective data (thread) distributions both within and between phases, taking data redistribution into account, in scientific applications composed of multiple phases. Our approach, which is implemented within the Adapt run-time data distribution system, takes measurements on the first iteration of the outermost loop in the application, and uses them to find the global distribution (over a reasonable set of distributions) that leads to the best completion time. Intial results are encouraging; for a flame code simulation where behavior is largely dependent on input data, Adapt finds an effective global data distribution with reasonable overhead.

[1]  Ken Kennedy,et al.  A static performance estimator to guide data partitioning decisions , 1991, PPOPP '91.

[2]  Barbara M. Chapman,et al.  Programming in Vienna Fortran , 1992, Sci. Program..

[3]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[4]  Jingke Li,et al.  Index domain alignment: minimizing cost of cross-referencing between distributed arrays , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[5]  Skef Wholey Automatic data mapping for distributed-memory parallel computers , 1992, ICS '92.

[6]  Gregory R. Andrews,et al.  Distributed filaments: efficient fine-grain parallelism on a cluster of workstations , 1994, OSDI '94.

[7]  Gregory R. Andrews,et al.  Using Fine-Grain Threads and Run-Time Decision Making in Parallel Computing , 1996, J. Parallel Distributed Comput..

[8]  Manish Gupta,et al.  PARADIGM: a compiler for automatic data distribution on multicomputers , 1993, ICS '93.

[9]  Gregory R. Andrews,et al.  Efficient support for fine‐grain parallelism on shared‐memory machines , 1998 .

[10]  Gregory R. Andrews,et al.  An adaptive approach to data placement , 1996, Proceedings of International Conference on Parallel Processing.

[11]  Geoffrey C. Fox,et al.  Runtime Support and Compilation Methods for User-Specified Irregular Data Distributions , 1995, IEEE Trans. Parallel Distributed Syst..

[12]  Joel H. Saltz,et al.  Runtime and language support for compiling adaptive irregular programs on distributed‐memory machines , 1995, Softw. Pract. Exp..

[13]  Michael Gerndt,et al.  SUPERB: A tool for semi-automatic MIMD/SIMD parallelization , 1988, Parallel Comput..

[14]  Ken Kennedy,et al.  Automatic Data Layout for High Performance Fortran , 1995, SC.

[15]  Sotiris Ioannidis,et al.  Compiler and Run-Time Support for Adaptive Load Balancing in Software Distributed Shared Memory Systems , 1998, LCR.

[16]  Saman Amarasinghe,et al.  The suif compiler for scalable parallel machines , 1995 .

[17]  Ken Kennedy,et al.  Compiling Fortran D for MIMD distributed-memory machines , 1992, CACM.