The impact of data transfer and buffering alternatives on network interface design

The explosive growth in the performance of microprocessors and networks has created a new opportunity to reduce the latency of fine-grain communication. Microprocessor clock speeds are now approaching the gigahertz range. Network switch latencies have dropped to tens of nanoseconds. Unfortunately, this explosive growth also exposes processor accesses to the network interface (NI) as a critical bottleneck for fine-grain communication. Researchers have proposed several techniques, such as using block loads and stores, user-level DMA, and coherent network interfaces, to alleviate this NI access bottleneck. We systematically identify, examine and evaluate the key parameters that underlie these design alternatives. We classify these parameters into two categories: data transfer and buffering parameters. The data transfer parameters capture how messages are transferred between internal memory structures (e.g. processor caches, main memory) of a computer and a memory bus NI. The buffering parameters capture how and where an NI buffers incoming network messages. We evaluate seven memory bus NIs that we believe capture the essential components of the design space exposed by these data transfer and buffering parameters.

[1]  P. Messina,et al.  Architectural requirements of parallel scientific applications with explicit communication , 1993, ISCA '93.

[2]  James R. Larus,et al.  Application-specific protocols for user-level shared memory , 1994, Proceedings of Supercomputing '94.

[3]  James R. Larus,et al.  Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator , 2000, IEEE Concurr..

[4]  James R. Larus,et al.  Efficient support for irregular applications on distributed-memory machines , 1995, PPOPP '95.

[5]  Mark D. Hill,et al.  Making Network Interfaces Less Peripheral , 1998, Computer.

[6]  Joseph Pasquale,et al.  The importance of non-data touching processing overheads in TCP/IP , 1993, SIGCOMM 1993.

[7]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[8]  Kai Li,et al.  Virtual-Memory-Mapped Network Interfaces , 1995, IEEE Micro.

[9]  Kai Li,et al.  Protected, user-level DMA for the SHRIMP network interface , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[10]  Matthew I. Frank,et al.  UDM: User Direct Messaging for General-Purpose Multiprocessing , 1996 .

[11]  James R. Larus,et al.  Tempest: a substrate for portable parallel programs , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.

[12]  T. von Eicken,et al.  Parallel programming in Split-C , 1993, Supercomputing '93.

[13]  Doug Burger,et al.  Parallelizing appbt for a shared- memory multiprocessor , 1985 .

[14]  Thorsten von Eicken,et al.  Incorporating Memory Management into User-Level Network Interfaces , 1997 .

[15]  Martin F. Arlitt,et al.  Web server workload characterization: the search for invariants , 1996, SIGMETRICS '96.

[16]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[17]  Babak Falsafi,et al.  Coherent Network Interfaces for Fine-Grain Communication , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[18]  Steven L. Scott,et al.  The Cray T3E Network: Adaptive Routing in a High Performance 3D Torus , 1996 .

[19]  Lori Pollock,et al.  An experimental study of several cooperative register allocation and instruction scheduling strategies , 1995, MICRO 1995.

[20]  David A. Patterson,et al.  Logp quantified: the case for low-overhead local area networks , 1995 .

[21]  Richard P. Martin,et al.  Assessing Fast Network Interfaces , 1996, IEEE Micro.

[22]  Duncan Roweth Computing Surface 2 , 1993, Supercomputer.

[23]  Dennis G. Shea,et al.  The SP2 High-Performance Switch , 1995, IBM Syst. J..

[24]  Richard B. Gillett Memory Channel Network for PCI , 1996, IEEE Micro.

[25]  Mark D. Hill,et al.  A case for making network interfaces less peripheral , 1997 .

[26]  Mark D. Hill,et al.  A Survey of User-Level Network Interfaces for System Area Networks , 1997 .

[27]  W. Daniel Hillis,et al.  The Network Architecture of the Connection Machine CM-5 , 1996, J. Parallel Distributed Comput..

[28]  Richard P. Martin,et al.  HPAM: an active message layer for a network of hp workstations , 1994, Symposium Record Hot Interconnects II.

[29]  Michael S. Ehrlich,et al.  StarT-jr : a parallel system from commodity technology , 1997 .

[30]  Evangelos P. Markatos,et al.  User-level DMA without operating system kernel modification , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[31]  Dhabaleswar K. Panda,et al.  How much does network contention affect distributed shared memory performance? , 1997, Proceedings of the 1997 International Conference on Parallel Processing (Cat. No.97TB100162).

[32]  Andrea C. Arpaci-Dusseau,et al.  Parallel programming in Split-C , 1993, Supercomputing '93. Proceedings.

[33]  Mark D. Hill,et al.  Address translation mechanisms in network interfaces , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[34]  Anoop Gupta,et al.  Integration of message passing and shared memory in the Stanford FLASH multiprocessor , 1994, ASPLOS VI.

[35]  Dana S. Henry,et al.  A tightly-coupled processor-network interface , 1992, ASPLOS V.

[36]  Andrew A. Chien,et al.  A comparison of architectural support for messaging in the TMC CM-5 and the Cray T3D , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[37]  James R. Larus,et al.  Tempest and typhoon: user-level shared memory , 1994, ISCA '94.