Copy emulation in checksummed, multiple-packet communication

Data copying can be a bottleneck in end-to-end communication over high-speed networks. Emulated copy is an alternative I/O data passing scheme that preserves the API and integrity guarantees of copying but avoids the latter using virtual memory manipulations - transient output copy-on-write (TCOW), input alignment, and page swapping. We characterize and evaluate the support necessary in network adapters for emulated copy in checksummed, multiple-packet communication. Our experiments on an ATM network show that: (1) emulated copy gives performance better than that of copying even without hardware checksumming support; (2) TCOW improves multiple-packet output performance without any hardware support or changes in applications; (3) page swapping provides additional similar improvements on multiple-packet input if there is input alignment, which requires either hardware support (early-demultiplexed/system-aligned buffering) or changes in applications (pooled/application-aligned buffering); and (4) The performance of application-aligned buffering is largely unaffected by header/data splitting, a common optimization. We propose a new optimization, buffer snap-off, that extends system-aligned buffering to the general case of arbitrary, unmatched data transfer and application input buffer lengths.

[1]  David D. Clark,et al.  Architectural considerations for a new generation of protocols , 1990, SIGCOMM '90.

[2]  Peter Steenkiste,et al.  Buffer management and flow control in the Credit Net ATM host interface , 1995, Proceedings of 20th Conference on Local Computer Networks.

[3]  David L. Black,et al.  Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures , 1987, ASPLOS 1987.

[4]  C. Dalton,et al.  Afterburner (network-independent card for protocols) , 1993, IEEE Network.

[5]  Sherali Zeadally,et al.  An Analysis of Process and Memory Models to Support High-Speed Networking in a UNIX Environment , 1996, USENIX Annual Technical Conference.

[6]  Hsiao-Keng Jerry Chu,et al.  Zero-Copy TCP in Solaris , 1996, USENIX Annual Technical Conference.

[7]  Willy Zwaenepoel,et al.  Optimistic implementation of bulk data transfer protocols , 1989, SIGMETRICS '89.

[8]  John K. Ousterhout,et al.  Why Aren't Operating Systems Getting Faster As Fast as Hardware? , 1990, USENIX Summer.

[9]  Randall J. Atkinson Default IP MTU for use over ATM AAL5 , 1994, RFC.

[10]  David L. Black,et al.  Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures , 1987, IEEE Trans. Computers.

[11]  Peter Druschel,et al.  Experiences with a high-speed network adaptor: a software perspective , 1994, SIGCOMM 1994.

[12]  Larry L. Peterson,et al.  PathFinder: A Pattern-Based Packet Classifier , 1994, OSDI.

[13]  Samuel J. Leffler,et al.  The design and implementation of the 4.3 BSD Unix operating system , 1991, Addison-Wesley series in computer science.

[14]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[15]  José Carlos Brustoloni,et al.  Effects of buffering semantics on I/O performance , 1996, OSDI '96.

[16]  Brian Zill,et al.  Software support for outboard buffering and checksumming , 1995, SIGCOMM '95.

[17]  David Banks,et al.  A High-Performance Network Architecture for a PA-RISC Workstation , 1993, IEEE J. Sel. Areas Commun..

[18]  Samuel J. Leffler,et al.  The design and implementation of the , 1990 .

[19]  Brian Zill,et al.  Protocol implementation on the Nectar Communication Processor , 1990, SIGCOMM 1990.

[20]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .