A quantitative comparison of parallel computation models

This paper experimentally validates performance related issues for parallel computation models on several parallel platforms (a MasPar NIP-1 with 1024 processors, a 64-node GCel and a CM-5 of 64 processors). Our work consists of three parts. First, there is an evaluation part in which we investigate whether the models correctly predict the execution time of an algorithm implementation. Unlike previous work, which mostly demonstrated a close match between the measured and predicted running times, this paper shows that there are situations in which the models do not precisely predict the actual execution time of an algorithm implementation. Second, there is a comparison part in which the models are contrasted with each other in order to determine which model induces the fastest algorithms. Finally, there is an eficiency validation part in which the performance of the model derived algorithms are compared with the performance of highly optimized library routines to show the effectiveness of deriving fast algorithms through the formalisms of the models.

[1]  Kenneth E. Batcher,et al.  Sorting networks and their applications , 1968, AFIPS Spring Joint Computing Conference.

[2]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.

[3]  H. T. Kung,et al.  I/O complexity: The red-blue pebble game , 1981, STOC '81.

[4]  Alfred V. Aho,et al.  Data Structures and Algorithms , 1983 .

[5]  Alok Aggarwal,et al.  On communication latency in PRAM computations , 1989, SPAA '89.

[6]  Tom Blank,et al.  The MasPar MP-1 architecture , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[7]  Alok Aggarwal,et al.  Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..

[8]  John R. Nickolls,et al.  The design of the MasPar MP-1: a cost effective massively parallel computer , 1990, Digest of Papers Compcon Spring '90. Thirty-Fifth IEEE Computer Society International Conference on Intellectual Leverage.

[9]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[10]  Guy E. Blelloch,et al.  A comparison of sorting algorithms for the connection machine CM-2 , 1991, SPAA '91.

[11]  Clyde P. Kruskal,et al.  Towards a single model of efficient computation in real parallel machines , 1991, Future Gener. Comput. Syst..

[12]  W. Daniel Hillis,et al.  The network architecture of the Connection Machine CM-5 (extended abstract) , 1992, SPAA '92.

[13]  John H. Reif,et al.  Implementations of randomized sorting on large parallel machines , 1992, SPAA '92.

[14]  Leslie G. Valiant,et al.  Direct Bulk-Synchronous Parallel Algorithms , 1992, J. Parallel Distributed Comput..

[15]  Sanjay Ranka,et al.  A Practical Hierarchical Model of Parallel Computation. I. The Model , 1992, J. Parallel Distributed Comput..

[16]  T. von Eicken,et al.  Parallel programming in Split-C , 1993, Supercomputing '93.

[17]  Jack Dongarra,et al.  Pvm 3 user's guide and reference manual , 1993 .

[18]  Andrea C. Arpaci-Dusseau,et al.  Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.

[19]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[20]  Corporate The MPI Forum,et al.  MPI: a message passing interface , 1993, Supercomputing '93.

[21]  Richard P. Martin,et al.  Fast parallel sorting under logp: from theory to practice , 1993 .

[22]  W. F. McColl,et al.  General purpose parallel computing , 1993 .

[23]  Ben H. H. Juurlink,et al.  Experiences with a model for parallel computation , 1993, PODC '93.

[24]  Rolf Wanka,et al.  Sorting large data sets on a massively parallel system , 1994, Proceedings of 1994 6th IEEE Symposium on Parallel and Distributed Processing.

[25]  Rob H. Bisseling,et al.  Scientific Computing on Bulk Synchronous Parallel Architectures , 1994, IFIP Congress.

[26]  W. F. McColl,et al.  Bulk synchronous parallel computing , 1995 .

[27]  William F. McColl,et al.  Scalable Computing , 1995, Computer Science Today.

[28]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[29]  Leslie G. Valiant,et al.  Bulk synchronous parallel computing-a paradigm for transportable software , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[30]  W. Daniel Hillis,et al.  The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..

[31]  Ben H. H. Juurlink,et al.  Communication Primitives for BSP Computers , 1996, Inf. Process. Lett..

[32]  Torsten Suel,et al.  Towards efficiency and portability: programming with the BSP model , 1996, SPAA '96.

[33]  Kwan Woo Ryu,et al.  The Block Distributed Memory Model , 1996, IEEE Trans. Parallel Distributed Syst..

[34]  David B. Skillicorn,et al.  Questions and Answers about BSP , 1997, Sci. Program..

[35]  John Beidler,et al.  Data Structures and Algorithms , 1996, Wiley Encyclopedia of Computer Science and Engineering.

[36]  Torsten Suel,et al.  BSPlib: The BSP programming library , 1998, Parallel Comput..

[37]  Ben H. H. Juurlink,et al.  Experimental validation of parallel computation models on the Intel Paragon , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[38]  H. Wijshoff,et al.  A quantitative comparison of parallel computation models , 1998, TOCS.