Message-Passing for the 21 st Century : Integrating User-Level Networks with SMT

We describe a new architecture that improves message-passing performance, both for device I/O and for interprocessor communication. Our architecture integrates an SMT processor with a userlevel network interface that can directly schedule threads on the processor. By allowing the network interface to directly initiate message handling code at user level, most of the OS-related overhead for handling interrupts and dispatching to user code is eliminated. By using an SMT processor, most of the latency of executing message handlers can be hidden. This paper presents measurements that show that the OS overheads for message-passing are significant, and briefly describes our architecture and the simulation environment that we are building to evaluate it.

[1]  Ricardo Bianchini,et al.  The MIT Alewife machine: architecture and performance , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[2]  Kai Li,et al.  Retrospective: virtual memory mapped network interface for the SHRIMP multicomputer , 1994, ISCA '98.

[3]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[4]  Ravindra Kuramkote,et al.  Message Passing Support in the Avalanche Widget , 1996 .

[5]  M. Birnbaum,et al.  How VSIA Answers the SOC Dilemma , 1999, Computer.

[6]  Dean M. Tullsen,et al.  Simultaneous multithreading: a platform for next-generation processors , 1997, IEEE Micro.

[7]  Carl Staelin,et al.  lmbench: Portable Tools for Performance Analysis , 1996, USENIX Annual Technical Conference.

[8]  Scott B. Marovich,et al.  Hamlyn: a high-performance network interface with sender-based memory management , 1995 .

[9]  Jeffrey S. Chase,et al.  Trapeze / IP : TCP / IP at Near-Gigabit Speeds , 1999 .

[10]  Andrew A. Chien,et al.  Retrospective: the J-machine , 1998, ISCA '98.

[11]  James R. Larus,et al.  Tempest and typhoon: user-level shared memory , 1994, ISCA '94.

[12]  Al Davis,et al.  Efficient Communication Mechanisms for Cluster Based Parallel Computing , 1997, CANPC.

[13]  Doug Matzke,et al.  Will Physical Scalability Sabotage Performance Gains? , 1997, Computer.

[14]  Sarita V. Adve,et al.  RSIM: An Execution-Driven Simulator for ILP-Based Shared-Memory Multiprocessors and Uniprocessors , 1997 .

[15]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[16]  Dana S. Henry,et al.  A tightly-coupled processor-network interface , 1992, ASPLOS V.

[17]  Mark D. Hill,et al.  Making Network Interfaces Less Peripheral , 1998, Computer.

[18]  A. Gupta,et al.  The Stanford FLASH multiprocessor , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[19]  D. Burger,et al.  Billion-Transistor Architectures , 1997, Computer.

[20]  Andrew A. ChienJanuary Fast Messages ( FM ) : E cient , Portable Communication for Workstation Clusters and Massively-Parallel Processors , 1997 .

[21]  D.E. Culler,et al.  Effects Of Communication Latency, Overhead, And Bandwidth In A Cluster Architecture , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[22]  Dean M. Tullsen,et al.  Supporting fine-grained synchronization on a simultaneous multithreading processor , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.