Design trade-offs for user-level I/O architectures

To address the growing I/O bottleneck, next-generation distributed I/O architectures employ scalable point-to-point interconnects and minimize operating system overhead by providing user-level access to the I/O subsystem. Reduced I/O overhead allows I/O intensive applications to efficiently employ latency hiding techniques for improved throughput. This paper presents the design of a novel scalable user-level I/O architecture and evaluates the impact of various architectural mechanisms in terms of overall performance improvement. Results demonstrate that eliminating data movement across protection domains is the dominant contributor to improved scalability. Eliminating system call and interrupt overhead only has a small additional benefit that may not justify the additional hardware support required. While this evaluation is based on one specific design, the conclusions can be generalized to other user-level I/O architectures

[1]  Thorsten von Eicken,et al.  U-Net: a user-level network interface for parallel and distributed computing , 1995, SOSP.

[2]  Lambert Schaelicke ML-RSIM Reference Manual , 2002 .

[3]  Keith Bostic,et al.  The design and implementa-tion of the 4.4BSD operating system , 1996 .

[4]  Thorsten von Eicken,et al.  Incorporating Memory Management into User-Level Network Interfaces , 1997 .

[5]  John K. Ousterhout,et al.  Why Aren't Operating Systems Getting Faster As Fast as Hardware? , 1990, USENIX Summer.

[6]  Henry M. Levy,et al.  Hardware and software support for efficient exception handling , 1994, ASPLOS VI.

[7]  David E. Culler,et al.  Design challenges of virtual networks: fast, general-purpose communication , 1999, PPoPP '99.

[8]  Steven G. Johnson,et al.  The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.

[9]  Jim Zelenka,et al.  A cost-effective, high-bandwidth storage architecture , 1998, ASPLOS VIII.

[10]  Samuel J. Leffler,et al.  The design and implementation of the 4.3 BSD Unix operating system , 1991, Addison-Wesley series in computer science.

[11]  David L Weaver,et al.  The SPARC architecture manual : version 9 , 1994 .

[12]  Kai Li,et al.  UTLB: a mechanism for address translation on network interfaces , 1998, ASPLOS VIII.

[13]  Lambert Schaelicke,et al.  Profiling interrupt handler performance through kernel instrumentation , 2003, Proceedings 21st International Conference on Computer Design.

[14]  Luiz André Barroso,et al.  Memory system characterization of commercial workloads , 1998, ISCA.

[15]  Al Davis,et al.  Improving I/O performance with a conditional store buffer , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[16]  Evangelos P. Markatos,et al.  User-level DMA without operating system kernel modification , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[17]  Sarita V. Adve,et al.  RSIM Reference Manual: Version 1.0 , 1997 .

[18]  Marco Fillo,et al.  Architecture and implementation of MEMORY CHANNEL 2 , 1997 .

[19]  Al Davis,et al.  Architectural Support of User-Level Input/Output , 2001 .

[20]  Kai Li,et al.  Experiences with VI communication for database storage , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.

[21]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[22]  Thorsten von Eicken,et al.  Memory management for user-level network interfaces , 1998, IEEE Micro.