Architectural Characterization of Processor Affinity in Network Processing

Network protocol stacks, in particular TCP/IP software implementations, are known for its inability to scale well in general-purpose monolithic operating systems (OS) for SMP. Previous researchers have experimented with affinitizing processes/thread, as well as interrupts from devices, to specific processors in a SMP system. However, general purpose operating systems have minimal consideration of user-defined affinity in their schedulers. Our goal is to expose the full potential of affinity by in-depth characterization of the reasons behind performance gains. We conducted an experimental study of TCP performance under various affinity modes on IA-based servers. Results showed that interrupt affinity alone provided a throughput gain of up to 25%, and combined thread/process and interrupt affinity can achieve gains of 30%. In particular, calling out the impact of affinity on machine clears (in addition to cache misses) is characterization that has not been done before

[1]  Jeffrey S. Chase,et al.  End system optimizations for high-speed TCP , 2001, IEEE Commun. Mag..

[2]  Robert Love,et al.  Linux Kernel Development , 2003 .

[3]  Greg J. Regnier,et al.  TCP performance re-visited , 2003, 2003 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS 2003..

[4]  Balaram Sinharoy,et al.  IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.

[5]  Srihari Makineni,et al.  Architectural characterization of TCP/IP packet processing on the Pentium/spl reg/ M microprocessor , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).

[6]  Joseph Pasquale,et al.  The importance of non-data touching processing overheads in TCP/IP , 1993, SIGCOMM 1993.

[7]  Weisong Shi,et al.  Workload Characterization of a Personalized Web Site — And Its Implications for Dynamic Content Caching , 2002 .

[8]  Jonathan M. Smith,et al.  AsyMOS-an asymmetric multiprocessor operating system , 1998, 1998 IEEE Open Architectures and Network Programming.

[9]  Derek L. Eager,et al.  Affinity scheduling of unbalanced workloads , 1994, Proceedings of Supercomputing '94.

[10]  Evangelos P. Markatos,et al.  Using processor affinity in loop scheduling on shared-memory multiprocessors , 1992, Proceedings Supercomputing '92.

[11]  David Clark,et al.  An analysis of TCP processing overhead , 1989 .

[12]  Donald F. Towsley,et al.  The effectiveness of affinity-based scheduling in multiprocessor network protocol processing (extended version) , 1996, TNET.

[13]  Vikram A. Saletore,et al.  ETA: experience with an Intel Xeon processor as a packet processing engine , 2004, IEEE Micro.

[14]  Raj Vaswani,et al.  The implications of cache affinity on processor scheduling for multiprogrammed, shared memory multiprocessors , 1991, SOSP '91.