Hiding communication latency in data parallel applications

Interprocessor communication times can be a significant fraction of the overall execution time required for data parallel applications. Large communication to computation ratios of the tasks performed by these applications results in suboptimal performance when executed on data parallel architectures. We present an alternate architectural framework, referred to as concurrently communicating SIMD (CCSIMD), which maintains the SIMD execution model, while introducing a small degree of task parallelism to exploit the communication concurrency. We introduce three different implementations of our architectural framework, and illustrate their effect on a suite of data parallel applications. Results show that CCSIMD architectures can provide a cost-effective way to hide communication latency in data parallel applications that can result in an increase in the performance of these applications.

[1]  Myung Hoon Sunwoo,et al.  Implementation of a SliM array processor , 1996, Proceedings of International Conference on Parallel Processing.

[2]  P.E. Bjorstad,et al.  Unstructured grids on SIMD Torus machines , 1994, Proceedings of IEEE Scalable High Performance Computing Conference.

[3]  David E. Schimmel,et al.  CCSIMD: A Concurrent Communication and Computation Framework for SIMD Machines , 1997, PCRCW.

[4]  David E. Schimmel,et al.  Issues in the Design of High Performance SIMD Architectures , 1996, IEEE Trans. Parallel Distributed Syst..

[5]  Vivek Garg Mechanisms for hiding communication latency in data parallel architectures , 1998 .

[6]  Peter M. Kogge,et al.  EXECUBE-A New Architecture for Scaleable MPPs , 1994, 1994 International Conference on Parallel Processing Vol. 1.

[7]  Jerry L. Potter The Massively Parallel Processor , 1985 .