Address translation mechanisms in network interfaces

Good network hardware performance is often squandered by overheads for accessing the network interface (NI) within a host. NIs that support user-level messaging avoid frequent operating system (OS) action yet unnecessary copying can still result in low performance. We explore improving application messaging performance by eliminating all unnecessary copies (minimal messaging). For minimal messaging, NIs must support address translation and must do so more richly than has been done in the past. NI address translation should flexibly support higher-level abstractions, map all user space, exploit translation locality, and degrade gracefully, when locality is poor. We classify NI address translation implementations based on where the lookup and the miss handling are performed (CPU or NI). We present alternative designs and we consider how they interact with the OS. We provide simulation results that evaluate the alternative design points and we demonstrate feasibility with a real implementation using Myrinet. We find: NIs need not have hardware lookup structures, as software schemes are fast enough; it is difficult for an NI to handle its own translation misses unless commercial operating systems are substantially modified to view an NI as CPU peer; in the conventional situation where the operating system views the NI as a device, minimal messaging should be used only when the translation is present, while a single-copy protocol is used when it is not; and alternatively one can currently get acceptable performance when the CPU handle misses if the kernel provides very fast trap interfaces but microprocessor and operating system trends may make this alternative less viable in the long run.

[1]  C. Dalton,et al.  Afterburner (network-independent card for protocols) , 1993, IEEE Network.

[2]  A. Chien,et al.  High Performance Messaging on Workstations: Illinois Fast Messages (FM) for Myrinet , 1995, Proceedings of the IEEE/ACM SC95 Conference.

[3]  Henry M. Levy,et al.  Hardware and software support for efficient exception handling , 1994, ASPLOS VI.

[4]  Evangelos P. Markatos,et al.  Telegraphos: high-performance networking for parallel processing on workstation clusters , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[5]  Randy B. Osborne A hybrid deposit model for low overhead communication in high speed LANs , 1994, Protocols for High-Speed Networks.

[6]  Thorsten von Eicken,et al.  Incorporating Memory Management into User-Level Network Interfaces , 1997 .

[7]  Babak Falsafi,et al.  Kernel Support for the Wisconsin Wind Tunnel , 1993, USENIX Microkernels and Other Kernel Architectures Symposium.

[8]  Steven L. Scott,et al.  Synchronization and communication in the T3E multiprocessor , 1996, ASPLOS VII.

[9]  Milon Mackey,et al.  An implementation of the Hamlyn sender-managed interface architecture , 1996, OSDI '96.

[10]  James R. Larus,et al.  Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.

[11]  John Wilkes Hamlyn — an interface for sender- based communications , 1992 .

[12]  Charles L. Seitz,et al.  Myrinet: A Gigabit-per-Second Local Area Network , 1995, IEEE Micro.

[13]  Mark D. Hill,et al.  A Survey of User-Level Network Interfaces for System Area Networks , 1997 .

[14]  Jeffrey C. Mogul,et al.  Network locality at the scale of processes , 1991, SIGCOMM '91.

[15]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[16]  Uresh K. Vahalia UNIX Internals: The New Frontiers , 1995 .

[17]  Kai Li,et al.  Retrospective: virtual memory mapped network interface for the SHRIMP multicomputer , 1994, ISCA '98.

[18]  D LazowskaEdward,et al.  Separating data and control transfer in distributed operating systems , 1994 .

[19]  Richard P. Martin,et al.  HPAM: an active message layer for a network of hp workstations , 1994, Symposium Record Hot Interconnects II.

[20]  Robert W. Horst TNet: A Reliable System Area Network , 1995, IEEE Micro.

[21]  Peter Druschel,et al.  Experiences with a high-speed network adaptor: a software perspective , 1994, SIGCOMM 1994.

[22]  Mark D. Hill,et al.  Fine-grain distributed shared memory on clusters of workstations , 1998 .

[23]  Larry L. Peterson,et al.  Experiences with a high-speed network adaptor: a software perspective , 1994 .

[24]  Gordon Bell 1995 observations on supercomputing alternatives: did the MPP bandwagon lead to a cul-de-sac? , 1996, CACM.

[25]  James Cownie,et al.  Message Passing on the Meiko CS-2 , 1994, Parallel Comput..

[26]  Peter Steenkiste A systematic approach to host interface design for high-speed networks , 1994, Computer.

[27]  James R. Larus,et al.  Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator , 2000, IEEE Concurr..

[28]  R. Gillett,et al.  Overview of memory channel network for PCI , 1996, COMPCON '96. Technologies for the Information Superhighway Digest of Papers.

[29]  C. Thompson Special Interest Group , 1995 .

[30]  P. Druschel,et al.  Network subsystem design , 1993, IEEE Network.

[31]  Qin Zheng,et al.  DART — A Low Overhead ATM Network Interface Chip , 1996 .

[32]  Larry L. Peterson,et al.  Fbufs: a high-bandwidth cross-domain transfer facility , 1994, SOSP '93.

[33]  Greg J. Regnier,et al.  The Virtual Interface Architecture , 2002, IEEE Micro.

[34]  Kai Li,et al.  Protected, user-level DMA for the SHRIMP network interface , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[35]  Richard P. Martin,et al.  LogP Performance Assessment of Fast Network Interfaces , 1995 .

[36]  Anoop Gupta,et al.  Integration of message passing and shared memory in the Stanford FLASH multiprocessor , 1994, ASPLOS VI.

[37]  José Carlos Brustoloni,et al.  Effects of buffering semantics on I/O performance , 1996, OSDI '96.

[38]  David E. Culler,et al.  High-performance local area communication with fast sockets , 1997 .

[39]  Hsiao-Keng Jerry Chu,et al.  Zero-Copy TCP in Solaris , 1996, USENIX Annual Technical Conference.

[40]  Richard P. Martin,et al.  Assessing Fast Network Interfaces , 1996, IEEE Micro.

[41]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[42]  James R. Larus,et al.  Implementing Fine-grain Distributed Shared Memory on Commodity SMP Workstations , 1996 .

[43]  F. Bitz,et al.  Host interface design for ATM LANs , 1991, [1991] Proceedings 16th Conference on Local Computer Networks.

[44]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[45]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.