Portable and Efficient Parallel Computing Using the BSP Model

The Bulk-Synchronous Parallel (BSP) model was proposed by Valiant as a standard interface between parallel software and hardware. In theory, the BSP model has been shown to allow the asymptotically optimal execution of architecture independent software on a variety of architectures. Our goal in this work is to experimentally examine the practical use of the BSP model on current parallel architectures. We describe the design and implementation of the Green BSP Library, a small library of functions that implement the BSP model, and of several applications that were written for this library. We then discuss the performance of the library and application programs on several parallel architectures. Our results are positive in that we demonstrate efficiency and portability over a range of parallel architectures and show that the BSP cost model is useful for predicting performance trends and estimating execution times.

[1]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[2]  John L. Gustafson,et al.  Reevaluating Amdahl's law , 1988, CACM.

[3]  Leslie G. Valiant,et al.  General Purpose Parallel Architectures , 1991, Handbook of Theoretical Computer Science, Volume A: Algorithms and Complexity.

[4]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[5]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[6]  Amotz Bar-Noy,et al.  Designing broadcasting algorithms in the postal model for message-passing systems , 1992, SPAA '92.

[7]  Michael S. Warren,et al.  Astrophysical N-body simulations using hierarchical tree data structures , 1992, Proceedings Supercomputing '92.

[8]  Anoop Gupta,et al.  Programming for Different Memory Consistency Models , 1992, J. Parallel Distributed Comput..

[9]  T. von Eicken,et al.  Parallel programming in Split-C , 1993, Supercomputing '93.

[10]  Rice UniversityCORPORATE,et al.  High performance Fortran language specification , 1993 .

[11]  Anoop Gupta,et al.  Scaling parallel programs for multiprocessors: methodology and examples , 1993, Computer.

[12]  William Aiello,et al.  An atomic model for message-passing , 1993, SPAA '93.

[13]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[14]  Richard P. Martin,et al.  Fast parallel sorting under logp: from theory to practice , 1993 .

[15]  Guy E. Blelloch,et al.  NESL: A Nested Data-Parallel Language (Version 2.6) , 1993 .

[16]  W. F. McColl,et al.  General purpose parallel computing , 1993 .

[17]  Yossi Matias,et al.  Efficient low-contention parallel algorithms , 1994, SPAA '94.

[18]  William Gropp,et al.  Skjellum using mpi: portable parallel programming with the message-passing interface , 1994 .

[19]  Rob H. Bisseling,et al.  Scientific Computing on Bulk Synchronous Parallel Architectures , 1994, IFIP Congress.

[20]  George Karypis,et al.  Introduction to Parallel Computing , 1994 .

[21]  Yossi Matias,et al.  The QRQW PRAM: accounting for contention in parallel algorithms , 1994, SODA '94.

[22]  Leslie G. Valiant,et al.  Direct Bulk-Synchronous Parallel Algorithms , 1994, J. Parallel Distributed Comput..

[23]  David Lecomber,et al.  An Object-Oriented Programming Model for BSP Computations , 1994 .

[24]  Simon Knee,et al.  Program Development and Performance Prediction on BSP Machines Using Opal , 1994 .

[25]  Bruce M. Maggs,et al.  Proceedings of the 28th Annual Hawaii International Conference on System Sciences- 1995 Models of Parallel Computation: A Survey and Synthesis , 2022 .

[26]  Boleslaw K. Szymanski,et al.  Plasma Simulation on Networks of Workstations using the Bulk-Synchronous Parallel Model , 1995, PDPTA.

[27]  Richard M. Karp,et al.  Scheduling Parallel Communication: The h-relation Problem , 1995, MFCS.

[28]  Friedhelm Meyer auf der Heide,et al.  Truly Efficient Parallel Algorithms: c-Optimal Multisearch for an Extension of the BSP Model (Extended Abstract) , 1995, ESA.

[29]  Jack Dongarra,et al.  PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing , 1995 .

[30]  R. Bisseling Sparse Matrix Computations on Bulk Synchronous Parallel Computers , 1995 .

[31]  S.S. Lumetta,et al.  Towards Modeling the Performance of a Fast Connected Components Algorithm on Parallel Machines , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[32]  Thomas E. Cheatham,et al.  General Purpose Optimization Technology , 1995, LCPC.

[33]  Guy E. Blelloch,et al.  Accounting for memory bank contention and delay in high-bandwidth multiprocessors , 1995, SPAA '95.

[34]  Leslie G. Valiant,et al.  Bulk synchronous parallel computing-a paradigm for transportable software , 1995, Proceedings of the Twenty-Eighth Annual Hawaii International Conference on System Sciences.

[35]  Mark W. Goudreau,et al.  A bulk-synchronous parallel library implementation for the BBN butterfly GP1000 , 1996, Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing.

[36]  Guy E. Blelloch,et al.  Programming parallel algorithms , 1996, CACM.

[37]  Alexandros V. Gerbessiotis,et al.  Deterministic sorting and randomized median finding on the BSP model , 1996, SPAA '96.

[38]  Ben H. H. Juurlink,et al.  A quantitative comparison of parallel computation models , 1996, SPAA '96.

[39]  Paul G. Spirakis,et al.  BSP vs LogP , 1996, SPAA '96.

[40]  Gregory William Shumaker A bulk-synchronous parallel implementation on the Maspar , 1996 .

[41]  William F. McColl,et al.  Scalability, portability and predictability: The BSP approach to parallel programming , 1996, Future Gener. Comput. Syst..

[42]  Armin Bäumker,et al.  Fully dynamic search trees for an extension of the BSP model , 1996, SPAA '96.

[43]  Phillip B. Gibbons,et al.  Eecient L O W-contention Parallel Algorithms , 1996 .

[44]  Jonathan M. D. Hill,et al.  Theory, Practice, and a Tool for BSP Performance Prediction , 1996, Euro-Par, Vol. II.

[45]  Frank Dehne,et al.  Efficient External Memory Algorithms by Simulating Coarse-Grained Parallel Algorithms , 1997, SPAA '97.

[46]  Highly Portable and Efficient Implementations of Parallel Adaptive N-Body Methods , 1997, SC.

[47]  Michael Kaufmann,et al.  BSP-Like External-Memory Computation , 1997, CIAC.

[48]  D. Blackston,et al.  Highly Portable and Efficient Implementations of Parallel Adaptive N-Body Methods , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[49]  Yossi Matias,et al.  Can shared-memory model serve as a bridging model for parallel computation? , 1997, SPAA '97.

[50]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[51]  Torsten Suel,et al.  BSPlib: The BSP programming library , 1998, Parallel Comput..

[52]  Yossi Matias,et al.  The Queue-Read Queue-Write PRAM Model: Accounting for Contention in Parallel Algorithms , 1999, SIAM J. Comput..

[53]  Friedhelm Meyer auf der Heide,et al.  Truly Efficient Parallel Algorithms: 1-optimal Multisearch for an Extension of the BSP Model , 1998, Theor. Comput. Sci..

[54]  Rutger F. H. Hofman,et al.  Bandwidth and Latency Sensitivity of Parallel Applications in a Wide-Area System , 1998 .

[55]  Yossi Matias,et al.  Can shared-memory model serve as a bridging model for parallel computation? , 1997, SPAA '97.

[56]  Satish Rao,et al.  Single-Message vs. Batch Communication , 1999 .

[57]  Pangfeng Liu,et al.  Experiences with Parallel N-Body Simulation , 2000, IEEE Trans. Parallel Distributed Syst..

[58]  J. CARRIERt,et al.  A FAST ADAPTIVE MULTIPOLE ALGORITHM FOR PARTICLE SIMULATIONS * , 2022 .