PacketShader : AGPU-Accelerate dSoftwar eRouter

We present PacketShader, a high-performance software router framework for general packet processing with Graphics Processing Unit (GPU) acceleration. PacketShader exploits the massively-parallel processing power of GPU to address the CPU bottleneck in current software routers. Combined with our high-performance packet I/O engine, PacketShader outperforms existing software routers by more than a factor of four, forwarding 64B IPv4 packets at 39 Gbps on a single commodity PC. We have implemented IPv4 and IPv6 forwarding, OpenFlow switching, and IPsec tunneling to demonstrate the flexibility and performance advantage of PacketShader. The evaluation results show that GPU brings significantly higher throughput over the CPU-only implementation, confirming the effectiveness of GPU for computation and memory-intensive operations in packet processing.

[1]  Tim Güneysu,et al.  Exploiting the Power of GPUs for Asymmetric Cryptography , 2008, CHES.

[2]  Brian Tierney,et al.  System capability effects on algorithms for network bandwidth measurement , 2003, IMC '03.

[3]  EDDIE KOHLER,et al.  The click modular router , 2000, TOCS.

[4]  Donald Newell,et al.  An in-depth analysis of the impact of processor affinity on network performance , 2004, Proceedings. 2004 12th IEEE International Conference on Networks (ICON 2004) (IEEE Cat. No.04EX955).

[5]  Sotiris Ioannidis,et al.  Gnort: High Performance Network Intrusion Detection Using Graphics Processors , 2008, RAID.

[6]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[7]  Kurt Keutzer,et al.  NP-Click: a productive software development approach for network processors , 2004, IEEE Micro.

[8]  Sangjin Han,et al.  Building a single-box 100 Gbps software router , 2010, 2010 17th IEEE Workshop on Local & Metropolitan Area Networks (LANMAN).

[9]  Jamal Hadi Salim,et al.  Beyond Softnet , 2001, Annual Linux Showcase & Conference.

[10]  John Waldron,et al.  Practical Symmetric Key Cryptography on Modern Graphics Hardware , 2008, USENIX Security Symposium.

[11]  Nick McKeown,et al.  OpenFlow: enabling innovation in campus networks , 2008, CCRV.

[12]  Raffaele Bolla,et al.  Pc-based software routers: high performance and application service support , 2008, PRESTO '08.

[13]  Karthikeyan Sankaralingam,et al.  Evaluating GPUs for network packet signature matching , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[14]  Leslie G. Valiant,et al.  Universal schemes for parallel communication , 1981, STOC '81.

[15]  Lixin Gao,et al.  PdP: parallelizing data plane in virtual network substrate , 2009, VISA '09.

[16]  Naga K. Govindaraju,et al.  A Survey of General‐Purpose Computation on Graphics Hardware , 2007 .

[17]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[18]  S.A. Manavski,et al.  CUDA Compatible GPU as an Efficient Hardware Accelerator for AES Cryptography , 2007, 2007 IEEE International Conference on Signal Processing and Communications.

[19]  Yangdong Deng,et al.  IP routing processing with graphic processors , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[20]  Katerina J. Argyraki,et al.  RouteBricks: exploiting parallelism to scale software routers , 2009, SOSP '09.

[21]  Bernhard Plattner,et al.  Scalable high speed IP routing lookups , 1997, SIGCOMM '97.

[22]  Josep Torrellas,et al.  False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.

[23]  Jeff Bonwick,et al.  The Slab Allocator: An Object-Caching Kernel Memory Allocator , 1994, USENIX Summer.

[24]  Xin Wang,et al.  Nuclei: GPU-Accelerated Many-Core Network Coding , 2009, IEEE INFOCOM 2009.

[25]  Vikram A. Saletore,et al.  Evaluating network processing efficiency with processor partitioning and asynchronous I/O , 2006, EuroSys.

[26]  Bryan Veal,et al.  Performance scalability of a multi-core web server , 2007, ANCS '07.

[27]  Nick McKeown,et al.  Routing lookups in hardware at memory access speeds , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.

[28]  Fred Kuhns,et al.  Supercharging planetlab: a high performance, multi-application, overlay network platform , 2007, SIGCOMM 2007.

[29]  Yang Zhang,et al.  Corey: An Operating System for Many Cores , 2008, OSDI.

[30]  Wen-mei W. Hwu,et al.  Optimization principles and application performance evaluation of a multithreaded GPU using CUDA , 2008, PPoPP.

[31]  Guido Appenzeller,et al.  Implementing an OpenFlow switch on the NetFPGA platform , 2008, ANCS '08.

[32]  Mike Houston,et al.  A closer look at GPUs , 2008, Commun. ACM.

[33]  K. K. Ramakrishnan,et al.  Eliminating receive livelock in an interrupt-driven kernel , 1996, TOCS.

[34]  Jaehyuk Huh,et al.  HPCCD: Hybrid Parallel Continuous Collision Detection using CPUs and GPUs , 2009, Comput. Graph. Forum.