Supporting parallel applications on clusters of workstations: The intelligent network interface approach

This paper presents a novel networking architecture designed for communication intensive parallel applications running on clusters of workstations (COWs) connected by high speed network. This architecture permits: (1) the transfer of selected communication-related functionality the host machine to the network interface coprocessor and (2) the exposure of this functionality directly to applications as instructions of a Virtual Communication Machine (VCM) implemented by the coprocessor. The user-level code interacts directly with the network coprocessor as the host kernel only 'connects' the application to the VCM and does not participate in the data transfers. The distinctive feature of our design is its flexibility: the integration of the network with the application can be varied to maximize performance. The resulting communication architecture is characterized by a very low overhead on the host processor by latency and bandwidth close to the hardware limits, and by an application interface which enables zero-copy messaging and eases the port of some shared-memory parallel applications to COWs. The architecture admits low cost implementations based only on off-the-shelf hardware components. Additionally, its current ATM-based implementation can be used to communicate with any ATM-enabled host.

[1]  Chris J. Scheiman,et al.  Exploiting the capabilities of communications co-processors , 1996, Proceedings of International Conference on Parallel Processing.

[2]  RoşuMarcel-Cătălin,et al.  Supporting parallel applications on clusters of workstations , 1998 .

[3]  R. Fujimoto,et al.  Buffer management in shared-memory time warp systems , 1995, Proceedings 9th Workshop on Parallel and Distributed Simulation (ACM/IEEE).

[4]  Jonathan M. Smith,et al.  Hardware/Software Organization of a High-Performance ATM Host Interface , 1993, IEEE J. Sel. Areas Commun..

[5]  Henry M. Levy,et al.  Limits to low-latency communication on high-speed networks , 1993, TOCS.

[6]  A. Chien,et al.  High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[7]  Peter Steenkiste A systematic approach to host interface design for high-speed networks , 1994, Computer.

[8]  Richard P. Martin,et al.  Effects Of Communication Latency, Overhead, And Bandwidth In A Cluster Architecture , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[9]  Milon Mackey,et al.  An implementation of the Hamlyn sender-managed interface architecture , 1996, OSDI '96.

[10]  Chris I. Dalton,et al.  User-space protocols deliver high performance to applications on a low-cost Gb/s LAN , 1994, SIGCOMM 1994.

[11]  Liviu Iftode,et al.  Early Experience with Message-Passing on the SHRIMP Multicomputer , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[12]  Thorsten von Eicken,et al.  Incorporating Memory Management into User-Level Network Interfaces , 1997 .

[13]  John Wilkes Hamlyn — an interface for sender- based communications , 1992 .

[14]  Richard M. Fujimoto,et al.  GTW: a time warp system for shared memory multiprocessors , 1994, Proceedings of Winter Simulation Conference.

[15]  Al Davis,et al.  Efficient Communication Mechanisms for Cluster Based Parallel Computing , 1997, CANPC.

[16]  Peter Druschel,et al.  Lazy receiver processing (LRP): a network subsystem architecture for server systems , 1996, OSDI '96.

[17]  Wilson C. Hsieh,et al.  Optimistic active messages: a mechanism for scheduling communication with computation , 1995, PPOPP '95.

[18]  Christopher D. Carothers,et al.  Distributed simulation of large-scale PCS networks , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[19]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[20]  Mary L. Bailey,et al.  CNI: a high-performance network interface for workstation clusters , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[21]  Nandit Soparkar,et al.  Employing logic-enhanced memory for high-performance ATM network interfaces , 1996, Proceedings of 5th IEEE International Symposium on High Performance Distributed Computing.

[22]  Peter Druschel,et al.  Experiences with a high-speed network adaptor: a software perspective , 1994, SIGCOMM 1994.

[23]  Kai Li,et al.  Design and implementation of virtual memory-mapped communication on Myrinet , 1997, Proceedings 11th International Parallel Processing Symposium.

[24]  Marcel Rosu Processor Controlled Off-Processor I/O , 1995 .

[25]  David R. Jefferson,et al.  Virtual time , 1985, ICPP.