Towards a Theory of Optimal Communication Pipelines

In this paper, we study how to minimize the latency of a message through a network that consists of a number of store-and-forward stages. This research is especially relevant for today''s low overhead communication subsystems that employ dedicated processing elements for protocol processing. We develop an abstract pipeline model that reveals a crucial performance tradeoff. We subsequently exploit this tradeoff and present a series of fragmentation algorithms designed to minimize message latency. We provide an experimental methodology that enables the construction of customized pipeline algorithms that can adapt to the specific pipeline characteristics and application workloads. By applying this methodology to the Myrinet-GAM system, we have improved its latency by up to 51%. We also study the effectiveness of this technique for other realistic cases.

[1]  Scott Pakin,et al.  High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[2]  Anna R. Karlin,et al.  Reducing network latency using subpages in a global memory environment , 1996, ASPLOS VII.

[3]  Anna R. Karlin,et al.  Implementing global memory management in a workstation cluster , 1995, SOSP.

[4]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[5]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[6]  Jeffrey C. Mogul,et al.  Fragmentation considered harmful , 1987, CCRV.

[7]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[8]  David E. Culler,et al.  A case for NOW (networks of workstation) , 1995, PODC '95.

[9]  Richard P. Martin,et al.  Effects Of Communication Latency, Overhead, And Bandwidth In A Cluster Architecture , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[10]  Jon Postel,et al.  Transmission Control Protocol , 1981, RFC.

[11]  Van Jacobson,et al.  A tool to infer characteristics of internet paths , 1997 .

[12]  Andrea C. Arpaci-Dusseau,et al.  Parallel computing on the berkeley now , 1997 .

[13]  W. Vogels,et al.  A User-Level Network Interface for Parallel and Distributed Computing , 1995 .

[14]  David E. Culler,et al.  Virtual network transport protocols for Myrinet , 1998, IEEE Micro.

[15]  Alan L. Cox,et al.  TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.

[16]  Jeanna Neefe Matthews,et al.  Serverless network file systems , 1996, TOCS.

[17]  Kirk L. Johnson,et al.  CRL: high-performance all-software distributed shared memory , 1995, SOSP.

[18]  Chandramohan A. Thekkath,et al.  Petal: distributed virtual disks , 1996, ASPLOS VII.

[19]  Monica S. Lam,et al.  The design and evaluation of a shared object system for distributed memory machines , 1994, OSDI '94.

[20]  Chandramohan A. Thekkath,et al.  Frangipani: a scalable distributed file system , 1997, SOSP.

[21]  Jon Postel,et al.  Internet Protocol , 1981, RFC.