Improving Network Processing Concurrency using TCPServers

Exponentially growing bandwidth requirements and slowing gains in processor speeds have led to the popularity of multiprocessor architectures. Network stack parallelism is increasingly important to support such architectures. In this paper, we present techniques to improve network stack concurrency using our previous work, TCPServers, a system architecture for offloading network processing within an SMP system. TCPServers dedicates a subset of processors as packet processing engines (PPEs), which handle all asynchronous network events and perform receive processing. We introduce Receive Queues, data structures associated with each socket that store incoming network packets and are accessed exclusively at the PPEs. Using Receive Queues, we modify TCPServers based network stacks to incorporate early packet demultiplexing. We also present an efficient proportional fair scheduling algorithm, which processes incoming packets at the priority of the destination socket. Our experimental evaluation demonstrates that our modifications reduce the scheduling and synchronization overheads and improve the aggregate TCP/IP throughput by up to 75% compared against the default SMP stack. We also show that our system sustains this throughput, even when a large number of short lived connections are present.

[1]  K. K. Ramakrishnan,et al.  Eliminating receive livelock in an interrupt-driven kernel , 1996, TOCS.

[2]  Erich M. Nahum,et al.  Server Network Scalability and TCP Offload , 2005, USENIX Annual Technical Conference, General Track.

[3]  Erich M. Nahum,et al.  Performance issues in parallelized network protocols , 1994, OSDI '94.

[4]  Greg J. Regnier,et al.  TCP onloading for data center servers , 2004, Computer.

[5]  Jason Nieh,et al.  Group Ratio Round-Robin: O(1) Proportional Share Scheduling for Uniprocessor and Multiprocessor Systems , 2005, USENIX Annual Technical Conference, General Track.

[6]  Liviu Iftode,et al.  Sixth IEEE International Symposium on Network Computing and Applications (NCA 2007), 12 - 14 July 2007, Cambridge, MA, USA , 2007, IEEE International Symposium on Network Computing and Applications.

[7]  Vikram A. Saletore,et al.  Evaluating network processing efficiency with processor partitioning and asynchronous I/O , 2006, EuroSys.

[8]  David Mosberger,et al.  httperf—a tool for measuring web server performance , 1998, PERV.

[9]  Jamal Hadi Salim,et al.  Beyond Softnet , 2001, Annual Linux Showcase & Conference.

[10]  Alan L. Cox,et al.  An Evaluation of Network Stack Parallelization Strategies in Modern Operating Systems , 2006, USENIX Annual Technical Conference, General Track.

[11]  Douglas C. Schmidt,et al.  Measuring the performance of parallel message-based process architectures , 1995, Proceedings of INFOCOM'95.

[12]  Jonathan M. Smith,et al.  Functional divisions in the Piglet multiprocessor operating system , 1998, EW 8.

[13]  Liviu Iftode,et al.  TCP Servers: Offloading TCP Processing in Internet Servers. Design, Implementation, and Performance , 2002 .

[14]  Scott Rixner,et al.  TCP offload through connection handoff , 2006, EuroSys.

[15]  Erich M. Nahum,et al.  Networking support for large scale multiprocessor servers , 1996, SIGMETRICS '96.

[16]  Peter Druschel,et al.  Lazy receiver processing (LRP): a network subsystem architecture for server systems , 1996, OSDI '96.

[17]  Mats Björkman,et al.  Performance modeling of multiprocessor implementations of protocols , 1998, TNET.