Experimental evaluation of QSM, a simple shared-memory model

Parallel programming models should attempt to satisfy two conflicting goals. On one hand, they should hide architectural details so that algorithm designers can write simple, portable programs. On the other hand, models must expose architectural details so that designers can evaluate and optimize the performance of their algorithms. In this paper we experimentally examine the trade-offs made by a simple shared-memory model, QSM, to address this dilemma. The results indicate that analysis under the QSM model yields quite accurate results for reasonable input sizes and that algorithms developed under QSM achieve performance close to that obtainable through more complex models, such as BSP and LogP.

[1]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[2]  Richard M. Karp,et al.  Parallel Algorithms for Shared-Memory Machines , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[3]  Ken Kennedy A research agenda for high performance computing software , 1994 .

[4]  Richard P. Martin,et al.  Assessing Fast Network Interfaces , 1996, IEEE Micro.

[5]  John L. Hennessy,et al.  The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors , 1995 .

[6]  Michael Dahlin,et al.  Emulations between QSM, BSP, and LogP: a framework for general-purpose parallel algorithm design , 1999, SODA '99.

[7]  Yossi Matias,et al.  Can shared-memory model serve as a bridging model for parallel computation? , 1997, SPAA '97.

[8]  Leslie G. Valiant,et al.  Direct Bulk-Synchronous Parallel Algorithms , 1994, J. Parallel Distributed Comput..

[9]  Ben H. H. Juurlink,et al.  The E-BSP Model: Incorporating General Locality and Unbalanced Communication into the BSP Model , 1996, Euro-Par, Vol. II.

[10]  E. Anderson,et al.  Performance of the CRAY T3E Multiprocessor , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[11]  P. Messina,et al.  Architectural requirements of parallel scientific applications with explicit communication , 1993, ISCA '93.

[12]  Richard P. Martin,et al.  LogP Performance Assessment of Fast Network Interfaces , 1995 .

[13]  Richard M. Karp,et al.  Optimal broadcast and summation in the LogP model , 1993, SPAA '93.

[14]  K. ReinhardtS.,et al.  Tempest and typhoon , 1994 .

[15]  A G WijshoffHarry,et al.  A quantitative comparison of parallel computation models , 1998 .

[16]  James R. Larus,et al.  Tempest and typhoon: user-level shared memory , 1994, ISCA '94.

[17]  Yossi Matias,et al.  Can shared-memory model serve as a bridging model for parallel computation? , 1997, SPAA '97.

[18]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[19]  Torsten Suel,et al.  BSPlib: The BSP programming library , 1998, Parallel Comput..

[20]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[21]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[22]  Eric A. Brewer,et al.  How to get good performance from the CM-5 data network , 1994, Proceedings of 8th International Parallel Processing Symposium.

[23]  Armin Bäumker,et al.  Fully dynamic search trees for an extension of the BSP model , 1996, SPAA '96.

[24]  Ben H. H. Juurlink,et al.  A quantitative comparison of parallel computation models , 1996, SPAA '96.

[25]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.

[26]  D.E. Culler,et al.  Effects Of Communication Latency, Overhead, And Bandwidth In A Cluster Architecture , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[27]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.