Scalable and portable computing using the WPRAM model

Parallel machines are converging to a standard scalable architecture. A set of processors communicate through a network providing linear increases in bisection bandwidth, as the number of processors increase. Each processor also has a large local memory, providing linear increases in memory bandwidth. In addition, the support of uniform latencies removes the dependence of an algorithm on network locality to achieve good performance. It is widely believed that the widespread takeup of highly parallel machines can be aided by the provision of a standard computational model for the development and analysis of algorithms , leading to scalable and portable performance. This paper describes the recent research carried out using the WPRAM model. This aims to provide a small but exible set of operations which enables the implementation of highly concurrent algorithms with good practical performance. An associated cost system, targeted at the above class of machines, enables the analysis of alternative parallelisation methods, which can subsequently be \tuned" to the particular machine in use, for optimal performance.

[1]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[2]  Leslie Lamport,et al.  How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs , 2016, IEEE Transactions on Computers.

[3]  Larry Rudolph,et al.  Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors , 1983, TOPL.

[4]  Kenneth L. Clarkson,et al.  Las Vegas algorithms for linear and integer programming when the dimension is small , 1995, JACM.

[5]  Leslie G. Valiant,et al.  A Combining Mechanism for Parallel Computers , 1992, Heinz Nixdorf Symposium.

[6]  P M Dew,et al.  Scalable Dynamic Load Balancing Using a Highly Concurrent Shared Data Type 1 , 1996 .

[7]  David B. Loveman High performance Fortran , 1993, IEEE Parallel & Distributed Technology: Systems & Applications.

[8]  Martin E. Dyer,et al.  An optimal randomized planar convex hull algorithm with good empirical performance , 1995, SPAA '95.

[9]  Leslie G. Valiant,et al.  Optimality of a Two-Phase Strategy for Routing in Interconnection Networks , 1983, IEEE Transactions on Computers.

[10]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[11]  Kevin P. McAuliffe,et al.  The IBM Research Parallel Processor Prototype (RP3): Introduction and Architecture , 1985, ICPP.

[12]  Leonidas J. Guibas,et al.  Parallel computational geometry , 1988, 26th Annual Symposium on Foundations of Computer Science (sfcs 1985).

[13]  Anoop Gupta,et al.  Programming for Different Memory Consistency Models , 1992, J. Parallel Distributed Comput..

[14]  Anoop Gupta,et al.  The DASH prototype: implementation and performance , 1992, ISCA '92.

[15]  Roberto Bisiani,et al.  PLUS: a distributed shared-memory system , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[16]  Michael L. Scott,et al.  Algorithms for scalable synchronization on shared-memory multiprocessors , 1991, TOCS.

[17]  Philip K. McKinley,et al.  Communication issues in parallel computing across ATM networks , 1994, IEEE Parallel & Distributed Technology: Systems & Applications.

[18]  M. F. Worboys,et al.  A concepts-rich approach to spatial analysis, theory generation, and scientific discovery in GIS using massively parallel computing , 1994 .

[19]  Thomas Cheatham Models, Languages, and Compiler Technology for High Performance Computers , 1994, MFCS.

[20]  P. H. Welch,et al.  Networks, Routers and Transputers: Function, Performance and Applications , 1993 .