Upper time bounds for executing PRAM-programs on the LogP-machine

In sequential computing the step from programming in machine code to programming in machine independent high level languages has been done for decades. Although high level programming languages are available for parallel machines today’s parallel programs highly depend on the archit ectures they are intended to run on. Designing efficient parallel programs is a difficult task that can be performed by specialists only. Porting those programs to other parallel architectures is nearly impossible without a considerable loss of performance. Abstract machine models for parallel computing like the PRAM-model are accepted by theoreticians but have no practical relevance since these models don’t take into account properties of existing architectures. However, the PRAM is easy to program. Recently, Culler et al. defined the LogP machine model which better reflects the behaviour of massively parallel computers. In this work, we show transformations of a subclass of PRAM-programs leading to efficient LogP programs and give upper bounds for executing them on the LogP machine. Therefore, we first briefly summarize the transformations from PRAM to LogP programs. Second, we extend the LogP machine model by a set of machine instructions. Third, we define the classes of coarse and fine grained LogP programs. The former class of programs can be executed within the factor two of the optimum. The latter class of programs has an upper time bound for execution that is a little worse. Finally, we show how to decide statically which strategy is promising for a given program optimization problem.

[1]  Richard P. Brent,et al.  The Parallel Evaluation of General Arithmetic Expressions , 1974, JACM.

[2]  Tao Yang,et al.  On the Granularity and Clustering of Directed Acyclic Task Graphs , 1993, IEEE Trans. Parallel Distributed Syst..

[3]  Michael G. Norman,et al.  Models of machines and computation for mapping in multicomputers , 1993, CSUR.

[4]  Mihalis Yannakakis,et al.  Towards an architecture-independent analysis of parallel algorithms , 1990, STOC '88.

[5]  W. Lowe,et al.  On finding optimal clusterings of task graphs , 1995, Proceedings the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis.

[6]  Richard M. Karp,et al.  Parallel Algorithms for Shared-Memory Machines , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[7]  Beniamino Di Martino,et al.  Parallelization of Non-Simultaneous Iterative Methods for Systems of Linear Equations , 1994, CONPAR.

[8]  W. Zimmermann,et al.  On the implementation of virtual shared memory , 1993, Proceedings of Workshop on Programming Models for Massively Parallel Computers.

[9]  Leslie G. Valiant,et al.  General Purpose Parallel Architectures , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[10]  Richard M. Karp,et al.  Optimal broadcast and summation in the LogP model , 1993, SPAA '93.

[11]  Welf Löwe,et al.  An Approach to Machine-Independent Parallel Programming , 1994, CONPAR.

[12]  Vivek Sarkar,et al.  Partitioning and Scheduling Parallel Programs for Multiprocessing , 1989 .

[13]  Welf Löwe,et al.  Optimization of PRAM-Programs with Input-Dependent Memory Access , 1995, Euro-Par.

[14]  Welf Löwe,et al.  Programming Data-Parallel -- Executing Process-Parallel , 1995 .

[15]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[16]  Mihalis Yannakakis,et al.  Towards an Architecture-Independent Analysis of Parallel Algorithms , 1990, SIAM J. Comput..