Software overhead in messaging layers: where does the time go?

Despite improvements in network interfaces and software messaging layers, software communication overhead still dominates the hardware routing cost in most systems. In this study, we identify the sources of this overhead by analyzing software costs of typical communication protocols built atop the active messages layer on the CM-5. We show that up to 50–70% of the software messaging costs are a direct consequence of the gap between specific network features such as arbitrary delivery order, finite buffering, and limited fault-handling, and the user communication requirements of in-order delivery, end-to-end flow control, and reliable transmission. However, virtually all of these costs can be eliminated if routing networks provide higher-level services such as in-order delivery, end-to-end flow control, and packet-level fault-tolerance. We conclude that significant cost reductions require changing the constraints on messaging layers: we propose designing networks and network interfaces which simplify or replace software for implementing user communication requirements.

[1]  Lionel M. Ni,et al.  The Turn Model for Adaptive Routing , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[2]  Robert W. Horst Massively parallel systems you can trust , 1994, Proceedings of COMPCON '94.

[3]  Kai Li,et al.  Retrospective: virtual memory mapped network interface for the SHRIMP multicomputer , 1994, ISCA '98.

[4]  W. Daniel Hillis,et al.  The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..

[5]  Message P Forum,et al.  MPI: A Message-Passing Interface Standard , 1994 .

[6]  Dana S. Henry,et al.  A tightly-coupled processor-network interface , 1992, ASPLOS V.

[7]  Jon Beecroft,et al.  Meiko CS-2 Interconnect Elan-Elite Design , 1994, Parallel Comput..

[8]  Charles L. Seitz,et al.  A family of routing and communication chips based on the Mosaic , 1993 .

[9]  W. David Sincoskie,et al.  Sunshine: A High-Performance Self-Routing Broadband Packet Switch Architecture , 1991, IEEE J. Sel. Areas Commun..

[10]  Andrew A. Chien,et al.  Compressionless routing: a framework for adaptive and fault-tolerant routing , 1994, ISCA '94.

[11]  Anant Agarwal,et al.  Anatomy of a message in the Alewife multiprocessor , 1993 .

[12]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[13]  José Duato,et al.  On the Design of Deadlock-Free Adaptive Routing Algorithms for Multicomputers: Design Methodologies , 1991, PARLE.

[14]  William J. Dally,et al.  Virtual-channel flow control , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[15]  Thomas F. Knight,et al.  Technologies for low latency interconnection switches , 1991, SPAA '89.

[16]  Robert E. Kahn,et al.  A Protocol for Packet Network Intercommunication , 1974 .

[17]  S. Konstantinidou,et al.  Chaos router: architecture and performance , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[18]  W. Daniel Hillis,et al.  The network architecture of the Connection Machine CM-5 (extended abstract) , 1992, SPAA '92.

[19]  Brian N. Bershad,et al.  User-level interprocess communication for shared memory multiprocessors , 1991, TOCS.

[20]  Michael Burrows,et al.  Performance of Firefly RPC , 1989, SOSP '89.

[21]  G. A. Geist,et al.  The PVM System: Supercomputer Level Concurrent Computation on a Heterogeneous Network of Workstations , 1991, The Sixth Distributed Memory Computing Conference, 1991. Proceedings.

[22]  Rolf Hempel,et al.  The MPI Message Passing Interface Standard , 1994 .

[23]  Lionel M. Ni,et al.  The turn model for adaptive routing , 1992, ISCA '92.

[24]  William J. Dally,et al.  Deadlock-Free Adaptive Routing in Multicomputer Networks Using Virtual Channels , 1993, IEEE Trans. Parallel Distributed Syst..

[25]  Andrew A. Chien,et al.  The J-Machine: A Fine Grain Concurrent Computer , 1989 .

[26]  Michael Burrows,et al.  Performance of Firefly RPC , 1990, ACM Trans. Comput. Syst..