Overcoming the memory wall in packet processing

Overhead of memory accesses limits the performance of packet processing applications. To overcome this bottleneck, today's network processors can utilize a wide-range of mechanisms - such as multi-level memory hierarchy, wide-word accesses, special-purpose result-caches, asynchronous memory, and hardware multi-threading. However, supporting all of these mechanisms complicates programmability and hardware design, and wastes system resources. In this paper, we address the following fundamental question: what minimal set of hardware mechanisms must a network processor support to achieve the twin goals of simplified programmability and high packet throughput? We show that no single mechanism sufficies; the minimal set must include data-caches and multi-threading. Data-caches and multi-threading are complementary; whereas data- caches exploit locality to reduce the number of context-switches and the off-chip memory bandwidth requirement, multi-threading exploits parallelism to hide long cache-miss latencies.

[1]  Tzi-cker Chiueh,et al.  High-performance IP routing table lookup using CPU caching , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[2]  Tilman Wolf,et al.  Locality-aware predictive scheduling of network processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[3]  T. N. Vijaykumar,et al.  Efficient use of memory bandwidth to improve network processor throughput , 2003, ISCA '03.

[4]  Li Zhao,et al.  TCP/IP Cache Characterization in Commercial Server Workloads , 2004 .

[5]  George Varghese,et al.  Efficient fair queueing using deficit round robin , 1995, SIGCOMM '95.

[6]  Laxmi N. Bhuyan,et al.  Architectural analysis and instruction-set optimization design of network protocol processors , 2003, First IEEE/ACM/IFIP International Conference on Hardware/ Software Codesign and Systems Synthesis (IEEE Cat. No.03TH8721).

[7]  Walid Dabbous,et al.  Survey and taxonomy of IP address lookup algorithms , 2001, IEEE Netw..

[8]  Tzi-cker Chiueh,et al.  Improving Route Lookup Performance Using Network Processor Cache , 2002, ACM/IEEE SC 2002 Conference (SC'02).

[9]  Tilman Wolf,et al.  CommBench-a telecommunications benchmark for network processors , 2000, 2000 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS (Cat. No.00EX422).

[10]  Erich M. Nahum,et al.  Cache behavior of network protocols , 1997, SIGMETRICS '97.

[11]  George Varghese,et al.  A pipelined memory architecture for high throughput network processors , 2003, ISCA '03.

[12]  Brian N. Bershad,et al.  Characterizing processor architectures for programmable network interfaces , 2000 .

[13]  Raj Jain Characteristics of Destination Address Locality in Computer Networks: A Comparison of Caching Schemes , 1989, Comput. Networks ISDN Syst..

[14]  Harrick M. Vin,et al.  Managing memory access latency in packet processing , 2005, SIGMETRICS '05.

[15]  Nick McKeown,et al.  Algorithms for packet classification , 2001, IEEE Netw..

[16]  V. Srinivasan,et al.  Fast address lookups using controlled prefix expansion , 1999, TOCS.

[17]  Mukesh Singhal,et al.  A novel cache architecture to support layer-four packet classification at memory access speeds , 2000, Proceedings IEEE INFOCOM 2000. Conference on Computer Communications. Nineteenth Annual Joint Conference of the IEEE Computer and Communications Societies (Cat. No.00CH37064).

[18]  Craig Partridge,et al.  A Fifty Gigabit Per Second IP Router , 2001 .

[19]  Tzi-cker Chiueh,et al.  Cache memory design for network processors , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[20]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[21]  Udi Manber,et al.  A FAST ALGORITHM FOR MULTI-PATTERN SEARCHING , 1999 .

[22]  Tilman Wolf,et al.  Analysis of Network Processing Workloads , 2005, ISPASS.

[23]  Jean-Loup Baer,et al.  Memory hierarchy design for a multiprocessor look-up engine , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[24]  D. C. Feldmeier,et al.  Improving gateway performance with a routing-table cache , 1988, IEEE INFOCOM '88,Seventh Annual Joint Conference of the IEEE Computer and Communcations Societies. Networks: Evolution or Revolution?.

[25]  Douglas Comer,et al.  Network Systems Design Using Network Processors , 2003 .

[26]  Bernhard Plattner,et al.  Scalable high speed IP routing lookups , 1997, SIGCOMM '97.

[27]  Wendong Hu,et al.  NetBench: a benchmarking suite for network processors , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[28]  George Varghese,et al.  Tree bitmap: hardware/software IP lookups with incremental updates , 2004, CCRV.