A Trace Driven Comparison of Latency Hiding Techniques for Network Processors

Caching, multithreading and the combination of them are the major latency hiding techniques adopted in network processors (NPs). Although they achieve great success in general purpose processors (GPPs), none of them have been well studied under the new context of packet processing. In this paper, we simulate the processing procedure of a four-PE (processing element) network processor and thoroughly evaluate different configurations of these techniques with real-life packet traces. Our major findings include: (1) In general, all of these latency hiding techniques effectively increase the traffic throughput and robustness of NP; but thread allocation policy has great impact on their performance. (2) If assigning packets of the same flow to different threads is allowed, multithreading keeps the PE in a working state as long as possible and less jitter in packet sending rate is resulted than caching schemes; otherwise, a cache with a reasonable size outperforms multithreading in almost all metrics such as traffic throughput, packet loss rate, queuing and total delay. (3) When access latency is comparable to the working time of execution unit, the performance of multithreading is more sensitive to packet arrival process and memory reference pattern than caching. In short, caching and multithreading have their respective advantages under different environment. In some cases, combined caching and multithreading tend to bring more performance gain than simply adding more threads or cache entries.

[1]  Tilman Wolf,et al.  CommBench-a telecommunications benchmark for network processors , 2000, 2000 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS (Cat. No.00EX422).

[2]  Yitzchak M. Gottlieb,et al.  Building a robust software-based router using network processors , 2001, SOSP.

[3]  Andreas Herkersdorf,et al.  Technologies and building blocks for fast packet forwarding , 2001 .

[4]  Raj Yavatkar,et al.  A highly flexible, distributed multiprocessor architecture for network processing , 2003, Comput. Networks.

[5]  Wendong Hu,et al.  NetBench: a benchmarking suite for network processors , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[6]  Huan Liu A trace driven study of packet level parallelism , 2002, 2002 IEEE International Conference on Communications. Conference Proceedings. ICC 2002 (Cat. No.02CH37333).

[7]  Jean Calvignac,et al.  Fundamental architectural considerations for network processors , 2003, Comput. Networks.

[8]  T. V. Lakshman,et al.  High-speed policy-based packet forwarding using efficient multi-dimensional range matching , 1998, SIGCOMM '98.

[9]  Kevin Skadron,et al.  Performance, energy, and thermal considerations for SMT and CMP architectures , 2005, 11th International Symposium on High-Performance Computer Architecture.

[10]  Craig Partridge,et al.  Packet reordering is not pathological network behavior , 1999, TNET.