Modeling communication in parallel algorithms: a fruitful interaction between theory and systems?

Recently, several theoretical models of parallel architectures have been proposed to replace the PRAM as the model that is presented to an algorithm designer. A primary focus of the new models is to include the cost of interprocessor communication, which is increasingly important in modern parallel architectures. We argue that modeling the communication costs in the architecture or system is only one part of the problem. The other, and usually much more difficult, part is modeling the communication properties of the algorithm itself, which provides necessary inputs into the architectural model to determine overall complexity. In this context, we make three main points in this paper: (i) It is incomplete to describe communication without regard to its relationship with replication. We propose a description of the communication-replication relationship in terms of the working set hierarchy of an algorithm. (ii) Both inherent communication and the communication-replication relationship can be very difficult to model in irregular, dynamic computations that are crucial in many real-world applications. We present some examples that demonstrate this difficulty. (iii) We believe that substantial leverage can be obtained in this effort from the computer systems community, which can provide a hierarchy of simulation and profiling tools—from abstract to detailed—tailored to the needs of the algorithm designers. We propose an initial set of simulation tools, and we discuss possible future refinements to this set.

[1]  Alok Aggarwal,et al.  Communication Complexity of PRAMs , 1990, Theor. Comput. Sci..

[2]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[3]  Amotz Bar-Noy,et al.  Designing broadcasting algorithms in the postal model for message-passing systems , 1992, SPAA '92.

[4]  Sanjay Ranka,et al.  A Practical Hierarchical Model of Parallel Computation. I. The Model , 1992, J. Parallel Distributed Comput..

[5]  Clyde P. Kruskal,et al.  Towards a single model of efficient computation in real parallel machines , 1992, Future Gener. Comput. Syst..

[6]  Alok Aggarwal,et al.  On communication latency in PRAM computations , 1989, SPAA '89.

[7]  Anoop Gupta,et al.  Scaling parallel programs for multiprocessors: methodology and examples , 1993, Computer.

[8]  Peter J. Denning,et al.  The working set model for program behavior , 1968, CACM.

[9]  John L. Hennessy,et al.  Multiprocessor Simulation and Tracing Using Tango , 1991, ICPP.

[10]  Anoop Gupta,et al.  Working Sets, Cache Sizes, And Node Granularity Issues For Large-scale Multiprocessors , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[11]  A. Gupta,et al.  An efficient block-oriented approach to parallel sparse Cholesky factorization , 1993, Supercomputing '93.

[12]  Richard Cole,et al.  The APRAM: incorporating asynchrony into the PRAM model , 1989, SPAA '89.

[13]  Phillip B. Gibbons A more practical PRAM model , 1989, SPAA '89.

[14]  Donald E. Knuth,et al.  The Stanford GraphBase - a platform for combinatorial computing , 1993 .

[15]  Steven Fortune,et al.  Parallelism in random access machines , 1978, STOC.

[16]  Anoop Gupta,et al.  Working sets, cache sizes, and node granularity issues for large-scale multiprocessors , 1993, ISCA '93.

[17]  Anoop Gupta,et al.  An efficient block-oriented approach to parallel sparse Cholesky factorization , 1993, Supercomputing '93. Proceedings.

[18]  Jaswinder Pal Singh,et al.  Hierarchical n-body methods and their implications for multiprocessors , 1993 .

[19]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[20]  Anoop Gupta,et al.  The directory-based cache coherence protocol for the DASH multiprocessor , 1990, ISCA '90.