TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance

TCP Server is a system architecture aiming to offload network processing from the host(s) running an Internet server. The TCP Server can be executed on a dedicated processor, node, or intelligent network interface using lowoverhead, non-intrusive communication between it and the host(s) running the server application. In this paper, we present and evaluate two implementations of the TCP Server architecture: (1) using dedicated network processors on a symmetric multiprocessor (SMP) server and (2) using dedicated nodes on a cluster-based server built around a memory-mapped communication interconnect. We have quantified the impact of offloading on the performance of network servers for these two TCP Server implementations, using server applications with realistic workloads. We were able to achieve performance gains of up to 30% with our SMP-based as well as cluster-based implementations for the scenarios we studied. Based on our experience and results, we conclude that offloading the network processing from the host processor using a TCP Server architecture is beneficial to server performance when the server is overloaded. A complete offloading of the TCP/IP processing requires substantial computing resources on the TCP server. Depending on the application workload, either the host processor or the TCP server can become the bottleneck stressing the need for an adaptive scheme to balance the load between the host and the TCP server.

[1]  John H. Hartman,et al.  Scout: A Communications-Oriented Operating System (Abstract) , 1994, OSDI.

[2]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[3]  Jonathan M. Smith,et al.  Functional divisions in the Piglet multiprocessor operating system , 1998, EW 8.

[4]  Joel H. Saltz,et al.  Active disks: programming model, algorithms and evaluation , 1998, ASPLOS VIII.

[5]  Jonathan M. Smith,et al.  AsyMOS-an asymmetric multiprocessor operating system , 1998, 1998 IEEE Open Architectures and Network Programming.

[6]  Peter Druschel,et al.  Measuring the Capacity of a Web Server , 1997, USENIX Symposium on Internet Technologies and Systems.

[7]  Willy Zwaenepoel,et al.  IO-Lite: a unified I/O buffering and caching system , 1999, TOCS.

[8]  Liviu Iftode,et al.  Impact of Next-Generation I/O Architectures on the Design and Performance of Network Servers , 2002 .

[9]  Robert J. Fowler,et al.  MINT: a front end for efficient simulation of shared-memory multiprocessors , 1994, Proceedings of International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems.

[10]  Hsiao-Keng Jerry Chu,et al.  Zero-Copy TCP in Solaris , 1996, USENIX Annual Technical Conference.

[11]  David Mosberger,et al.  httperf—a tool for measuring web server performance , 1998, PERV.

[12]  Yousef A. Khalidi,et al.  An Efficient Zero-Copy I/O Framework for UNIX , 1995 .

[13]  Syam Gadde,et al.  Cheating the I/O Bottleneck: Network Storage with Trapeze/Myrinet , 1998, USENIX Annual Technical Conference.

[14]  K. K. Ramakrishnan,et al.  Eliminating receive livelock in an interrupt-driven kernel , 1996, TOCS.

[15]  Peter Druschel,et al.  Lazy receiver processing (LRP): a network subsystem architecture for server systems , 1996, OSDI '96.

[16]  Larry L. Peterson,et al.  Increasing network throughput by integrating protocol layers , 1993, TNET.

[17]  Peter Druschel,et al.  Resource containers: a new facility for resource management in server systems , 1999, OSDI '99.

[18]  Kai Li,et al.  Early Experience with Message-Passing on the SHRIMP Multicomputer , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[19]  Peter Druschel,et al.  Soft timers: efficient microsecond software timer support for network processing , 1999, SOSP.

[20]  K. Langendoen,et al.  Integrating polling, interrupts, and thread management , 1996, Proceedings of 6th Symposium on the Frontiers of Massively Parallel Computation (Frontiers '96).

[21]  Erich M. Nahum,et al.  Locality-aware request distribution in cluster-based network servers , 1998, ASPLOS VIII.

[22]  Greg J. Regnier,et al.  CSP: A Novel System Architecture for Scalable Internet and Communication Services , 2001, USITS.

[23]  David A. Patterson,et al.  ISTORE: introspective storage for data-intensive network services , 1999, Proceedings of the Seventh Workshop on Hot Topics in Operating Systems.

[24]  Frederick P. Brooks,et al.  Architecture of the IBM System/360 , 1964, IBM J. Res. Dev..

[25]  David E. Culler,et al.  Queue pair IP: a hybrid architecture for system area networks , 2002, ISCA.

[26]  J.M. Smith,et al.  Giving applications access to Gb/s networking , 1993, IEEE Network.

[27]  Liviu Iftode,et al.  MemNet: memory-mapped networking for servers , 2002 .

[28]  Jim Zelenka,et al.  A cost-effective, high-bandwidth storage architecture , 1998, ASPLOS VIII.

[29]  Qing Yang,et al.  Measurement, analysis and performance improvement of the Apache Web server , 1999, 1999 IEEE International Performance, Computing and Communications Conference (Cat. No.99CH36305).

[30]  Greg J. Regnier,et al.  The Virtual Interface Architecture , 2002, IEEE Micro.