On combining technology and theory in search of a parallel computation model

A fundamental problem in parallel computing is to design high-level, architecture independent, algorithms that execute efficiently on general purpose parallel machines. The aim is to be able to achieve portability and high performance simultaneously. A key to accomplishing this is the existence of a computation model that can bridge the gap between the high level programming models and the underlying hardware models. There are currently two factors that make this fundamental problem more tractable. The first is the emergence of a dominant parallel architecture consisting of a number of powerful microprocessors interconnected by either a proprietary interconnect, or a standard off-the-shelf interconnect (such as an ATM switch). The second factor is the emergence of standards, such as the message passing standard MPI, for which efficient implementations are either available or about to appear on most machines. Our recent work has exploited these two developments by developing a methodology based on (1) a simple computation model for the current MIMD platforms that incorporates communication cost into the complexity of the algorithms, and (2) a SPMD programming model that makes effective use of communication primitives. We describe our approach for validating the computation model based on extensive experimentation and the development of benchmarks, and discuss its extension to the emerging clusters of Symmetric Multiprocessors (SMPs) architecture.

[1]  David A. Bader,et al.  Parallel algorithms for image histogramming and connected components with an experimental study (extended abstract) , 1995, PPOPP '95.

[2]  Clive F. Baillie,et al.  Cluster Identification Algorithms for Spin Models - Sequential and Parallel , 1991, Concurr. Pract. Exp..

[3]  Alok Aggarwal,et al.  On communication latency in PRAM computations , 1989, SPAA '89.

[4]  David A. Bader,et al.  Parallel Algorithms for Image Histogramming and Connected Components with an Experimental Study , 1996, J. Parallel Distributed Comput..

[5]  Azriel Rosenfeld,et al.  The DARPA Image Understanding Benchmark for Parallel Computers , 1990, J. Parallel Distributed Comput..

[6]  R. Brower,et al.  A parallel multigrid algorithm for percolation clusters , 1991 .

[7]  Larry S. Davis,et al.  A new class of edge-preserving smoothing filters , 1987, Pattern Recognit. Lett..

[8]  Johan De Keyser,et al.  Load Balancing Data Parallel Programs on Distributed Memory Computers , 1993, Parallel Comput..

[9]  Sanjay Ranka,et al.  Parallel remapping algorithms for adaptive problems , 1995, Proceedings Frontiers '95. The Fifth Symposium on the Frontiers of Massively Parallel Computation.

[10]  Ken Kennedy,et al.  Software support for irregular and loosely synchronous problems , 1992 .

[11]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[12]  Torsten Suel,et al.  Efficient communication using total-exchange , 1995, Proceedings of 9th International Parallel Processing Symposium.

[13]  Friedhelm Meyer auf der Heide,et al.  Efficient PRAM simulation on a distributed memory machine , 1992, STOC '92.

[14]  Richard Cole,et al.  The APRAM: incorporating asynchrony into the PRAM model , 1989, SPAA '89.

[15]  Phillip B. Gibbons A more practical PRAM model , 1989, SPAA '89.

[16]  Kwan Woo Ryu,et al.  The block distributed memory model for shared memory multiprocessors , 1994, Proceedings of 8th International Parallel Processing Symposium.

[17]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[18]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[19]  W. Daniel Hillis,et al.  Data parallel algorithms , 1986, CACM.