Achieving 10Gbps network processing: are we there yet?

Scaling TCP/IP receive side processing to 10Gbps speeds on commercialserver platforms has been a major challenge. This led to the development oftwo key techniques: Large Receive Offload (LRO) and Direct Cache Access(DCA). Only recently, systems supporting these two techniques have becomeavailable. So, we want to evaluate these two techniques using 10Gigabit NICs tofind out if we can finally get 10Gbps rates. We evaluate these two techniques indetail to understand performance benefit offered by these two techniques and theremaining major overheads. Our measurements showed that LRO and DCA togetherimprove TCP/IP receive performance by more than 50% over the base case(no LRO and DCA). These two techniques combined with the improvements inthe CPU architecture and the rest of the platform over the last 3-4 years have morethan doubled the TCP/IP receive processing throughput to 7Gbps. Our detailedarchitectural characterization of TCP/IP processing, with these two features enabled,has revealed that buffer management and copy operations still take up significantamount of processing time. We also analyze the scaling behavior ofTCP/IP to figure out how multi-core architectures improve network processing.This part of our analysis has highlighted some limiting factors that need to be addressedto achieve scaling beyond 10Gbps.

[1]  Jeffrey C. Mogul Observing TCP dynamics in real networks , 1992, SIGCOMM 1992.

[2]  Joseph Pasquale,et al.  The importance of non-data touching processing overheads in TCP/IP , 1993, SIGCOMM 1993.

[3]  Thomas Stricker,et al.  Speculative defragmentation - a technique to improve the communication software efficiency for Gigabit Ethernet , 2000, Proceedings the Ninth International Symposium on High-Performance Distributed Computing.

[4]  Jeffrey S. Chase,et al.  End system optimizations for high-speed TCP , 2001, IEEE Commun. Mag..

[5]  David Clark,et al.  An analysis of TCP processing overhead , 1989 .

[6]  Greg J. Regnier,et al.  TCP performance re-visited , 2003, 2003 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS 2003..

[7]  Jeffrey C. Mogul,et al.  TCP Offload Is a Dumb Idea Whose Time Has Come , 2003, HotOS.

[8]  Greg J. Regnier,et al.  TCP onloading for data center servers , 2004, Computer.

[9]  Srihari Makineni,et al.  Architectural characterization of TCP/IP packet processing on the Pentium/spl reg/ M microprocessor , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[10]  Laxmi N. Bhuyan,et al.  Hardware support for bulk data movement in server platforms , 2005, 2005 International Conference on Computer Design.

[11]  Ram Huggahalli,et al.  Direct Cache Access for High Bandwidth Network I/O , 2005, ISCA 2005.

[12]  Mark Allman,et al.  On the Impact of Bursting on TCP Performance , 2005, PAM.

[13]  Bogdan M. Wilamowski,et al.  The Transmission Control Protocol , 2005, The Industrial Information Technology Handbook.

[14]  Ali G. Saidi,et al.  Integrated network interfaces for high-bandwidth TCP/IP , 2006, ASPLOS XII.

[15]  Yves Robert,et al.  High Performance Computing - HiPC 2006, 13th International Conference, Bangalore, India, December 18-21, 2006, Proceedings , 2006, HiPC.

[16]  Li Zhao,et al.  Receive Side Coalescing for Accelerating TCP/IP Processing , 2006, HiPC.

[17]  Ram Huggahalli,et al.  Impact of Cache Coherence Protocols on the Processing of Network Traffic , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[18]  Rajeev Balasubramonian,et al.  Leveraging 3D Technology for Improved Reliability , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).