A quantitative comparison of parallel computation models

In recent years, a large number of parallel computation models have been proposed to replace the PRAM as the parallel computation model presented to the algorithm designer. Although mostly the theoretical justifications for these models are sound, and many algorithmic results where obtained through these models, little experimentation has been conducted to validate the effectiveness of these models for developing cost-effective algorithms and applications on existing hardware platforms. In this article a first attempt is made to perform a detailed experimental account on the preciseness of these models. The achieve this, three models (BSP, E-BSP, and BPRAM) were selected and validated on five parallel platforms (Cray T3E, Thinking Machines CM-5, Intel Paragon, MasPar MP-1, and Parsytec GCel). The work described in this article consists of three parts. First, the predictive capabilities of the models are investigated. Unlike previous experimental work, which mostly demonstrated a close match between the measuredd and predicted execution times, this article shows that there are several situations in which the models do not precisely predict the actual runtime behavior of an algorithm implementation. Second, a comparison between the models is provided in order to determine the model that induces that most efficient algorithms. Lastly, the performance achieved by the model-derived algorithms is compared with the performace attained by machine-specific algorithms in order to examine the effectiveness of deriving fast algorithms through the formalisms of the models.

[1]  Torsten Suel,et al.  BSPlib: The BSP programming library , 1998, Parallel Comput..

[2]  Ben H. H. Juurlink,et al.  Experimental validation of parallel computation models on the Intel Paragon , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[3]  Ben H. H. Juurlink,et al.  The E-BSP Model: Incorporating General Locality and Unbalanced Communication into the BSP Model , 1996, Euro-Par, Vol. II.

[4]  Ben H. H. Juurlink,et al.  A quantitative comparison of parallel computation models , 1996, SPAA '96.

[5]  Ben H. H. Juurlink,et al.  Communication Primitives for BSP Computers , 1996, Inf. Process. Lett..

[6]  Torsten Suel,et al.  Towards efficiency and portability: programming with the BSP model , 1996, SPAA '96.

[7]  Leslie G. Valiant,et al.  Bulk synchronous parallel computing-a paradigm for transportable software , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[8]  Rolf Wanka,et al.  Sorting large data sets on a massively parallel system , 1994, Proceedings of 1994 6th IEEE Symposium on Parallel and Distributed Processing.

[9]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[10]  D. Culler,et al.  Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.

[11]  Ben H. H. Juurlink,et al.  Experiences with a model for parallel computation , 1993, PODC '93.

[12]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[13]  W. F. McColl,et al.  General purpose parallel computing , 1993 .

[14]  Sanjay Ranka,et al.  A Practical Hierarchical Model of Parallel Computation. I. The Model , 1992, J. Parallel Distributed Comput..

[15]  Leslie G. Valiant,et al.  Direct Bulk-Synchronous Parallel Algorithms , 1992, J. Parallel Distributed Comput..

[16]  John H. Reif,et al.  Implementations of randomized sorting on large parallel machines , 1992, SPAA '92.

[17]  W. Daniel Hillis,et al.  The network architecture of the Connection Machine CM-5 (extended abstract) , 1992, SPAA '92.

[18]  Clyde P. Kruskal,et al.  Towards a single model of efficient computation in real parallel machines , 1991, Future Gener. Comput. Syst..

[19]  Guy E. Blelloch,et al.  A comparison of sorting algorithms for the connection machine CM-2 , 1991, SPAA '91.

[20]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[21]  Mihalis Yannakakis,et al.  The input/output complexity of transitive closure , 1990, SIGMOD '90.

[22]  Alok Aggarwal,et al.  Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..

[23]  Tom Blank,et al.  The MasPar MP-1 architecture , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[24]  John R. Nickolls,et al.  The design of the MasPar MP-1: a cost effective massively parallel computer , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[25]  Alok Aggarwal,et al.  On communication latency in PRAM computations , 1989, SPAA '89.

[26]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[27]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.

[28]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[29]  David B. Skillicorn,et al.  Questions and Answers about BSP , 1997, Sci. Program..

[30]  Rob H. Bisseling,et al.  Scientific Computing on Bulk Synchronous Parallel Architectures , 1994, IFIP Congress.

[31]  Steven Brawer,et al.  An Introduction to Parallel Programming , 1989 .

[32]  Alfred V. Aho,et al.  Data Structures and Algorithms , 1983 .