A Cost Model for Communication on a Symmetric MultiProcessor

In this paper we conduct an in-depth study of the communication costs of programs when run on a typical Symmetric MultiProcessor, the SGI Power Challenge, characterized by powerful off-the-shelf microprocessors communicating through a shared memory via a shared-bus interconnect. Our study is based on an extensive set of experiments designed to assess the relative impact of a number of parameters on the cost of shared memory accesses. We provide evidence that interaction with the memory hierarchy affects communication in such a substantial way that none of the models previously considered in the literature can guarantee a reasonable level of accuracy since they do not take this interaction into account. We then determine two prediction functions that are very accurate predictors of best and worst performance with respect to the memory hierarchy. These functions provide a prediction interval that can be employed to obtain lower and upper bounds on the actual communication cost of an application, and to evaluate the degree of locality of the memory access patterns involved.

[1]  Guy E. Blelloch,et al.  Accounting for memory bank contention and delay in high-bandwidth multiprocessors , 1995, SPAA '95.

[2]  Leslie G. Valiant,et al.  A logarithmic time sort for linear size networks , 1982, STOC.

[3]  Andrew Rau-Chaplin,et al.  Scalable parallel geometric algorithms for coarse grained multicomputers , 1993, SCG '93.

[4]  Frank Thomson Leighton,et al.  Tight Bounds on the Complexity of Parallel Sorting , 1984, IEEE Transactions on Computers.

[5]  W. Donald Frazer,et al.  Samplesort: A Sampling Approach to Minimal Storage Tree Sorting , 1970, JACM.

[6]  Alok Aggarwal,et al.  Hierarchical memory with block transfer , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[7]  Gheith A. Abandah,et al.  Characterizing Distributed Shared Memory Performance: A Case Study of the Convex SPP1000 , 1998, IEEE Trans. Parallel Distributed Syst..

[8]  Paul G. Spirakis,et al.  BSP vs LogP , 1996, SPAA '96.

[9]  Yossi Matias,et al.  Can shared-memory model serve as a bridging model for parallel computation? , 1997, SPAA '97.

[10]  J. S. Huang,et al.  Parallel sorting and data partitioning by sampling , 1983 .

[11]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[12]  Yossi Matias,et al.  Can shared-memory model serve as a bridging model for parallel computation? , 1997, SPAA '97.

[13]  Guy E. Blelloch,et al.  A comparison of sorting algorithms for the connection machine CM-2 , 1991, SPAA '91.

[14]  Ben H. H. Juurlink,et al.  A quantitative comparison of parallel computation models , 1996, SPAA '96.

[15]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[16]  Friedhelm Meyer auf der Heide,et al.  Truly Efficient Parallel Algorithms: c-Optimal Multisearch for an Extension of the BSP Model (Extended Abstract) , 1995, ESA.

[17]  Friedhelm Meyer auf der Heide,et al.  Truly Efficient Parallel Algorithms: 1-optimal Multisearch for an Extension of the BSP Model , 1998, Theor. Comput. Sci..

[18]  Anoop Gupta,et al.  Modeling communication in parallel algorithms: a fruitful interaction between theory and systems? , 1994, SPAA '94.