LoGPC: Modeling Network Contention in Message-Passing Programs

In many real applications, for example, those with frequent and irregular communication patterns or those using large messages, network contention and contention for message processing resources can be a significant part of the total execution time. This paper presents a new cost model, called LoGPC, that extends the LogP and LogGP models to account for the impact of network contention and network interface DMA behavior on the performance of message passing programs. We validate LoGPC by analyzing three applications implemented with Active Messages on the MIT Alewife multiprocessor. Our analysis shows that network contention accounts for up to 50 percent of the total execution time. In addition, we show that the impact of communication locality on the communication costs is at most a factor of two on Alewife. Finally, we use the model to identify trade-offs between synchronous and asynchronous message passing styles.

[1]  Albert G. Greenberg On the time complexity of broadcast communication schemes (Preliminary Version) , 1982, STOC '82.

[2]  Marc Snir,et al.  The Performance of Multistage Interconnection Networks for Multiprocessors , 1983, IEEE Transactions on Computers.

[3]  Mihalis Yannakakis,et al.  Towards an Architecture-Independent Analysis of Parallel Algorithms , 1990, SIAM J. Comput..

[4]  Anant Agarwal,et al.  Limits on Interconnection Network Performance , 1991, IEEE Trans. Parallel Distributed Syst..

[5]  D. Culler,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[6]  Kirk L. Johnson The impact of communication locality on large-scale multiprocessor performance , 1992, ISCA '92.

[7]  T. von Eicken,et al.  Parallel programming in Split-C , 1993, Supercomputing '93.

[8]  Ramesh Subramonian,et al.  LogP: towards a realistic model of parallel computation , 1993, PPOPP '93.

[9]  Richard P. Martin,et al.  LogP Performance Assessment of Fast Network Interfaces , 1995 .

[10]  Andrew A. Chien,et al.  A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[11]  John L. Hennessy,et al.  The Effects of Latency, Occupancy, and Bandwidth in Distributed Shared Memory Multiprocessors , 1995 .

[12]  Chris J. Scheiman,et al.  LogGP: incorporating long messages into the LogP model—one step closer towards a realistic model for parallel computation , 1995, SPAA '95.

[13]  N. Madsen Divergence preserving discrete surface integral methods for Maxwell's curl equations using non-orthogonal unstructured grids , 1995 .

[14]  Andrea C. Arpaci-Dusseau,et al.  Fast Parallel Sorting Under LogP: Experience with the CM-5 , 1996, IEEE Trans. Parallel Distributed Syst..

[15]  Richard P. Martin,et al.  Assessing Fast Network Interfaces , 1996, IEEE Micro.

[16]  Peter Thanisch,et al.  Analysis of multicomputer schedules in cost and latency model of communication , 1997 .

[17]  D.E. Culler,et al.  Effects Of Communication Latency, Overhead, And Bandwidth In A Cluster Architecture , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[18]  Mary K. Vernon,et al.  LoPC: modeling contention in parallel algorithms , 1997, PPOPP '97.

[19]  Victor Lee,et al.  Exploiting two-case delivery for fast protected messaging , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[20]  Donald Yeung,et al.  The MIT Alewife machine: architecture and performance , 1995, ISCA '98.

[21]  Rajeev Barua,et al.  The sensitivity of communication mechanisms to bandwidth and latency , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.