Accelerating Open vSwitch with Integrated GPU

With the fast development of Software Defined Networking (SDN) and network virtualization, software-based network virtual switches have emerged as a critical component to provide network services to VMs. Among virtual switches, Open vSwitch (OvS) is an open source virtual switch implementation commonly used and well-studied. Using Data Plane Development Kit (DPDK) with OvS to bypass the OS kernel and process packets in userspace provides tremendous performance benefits on general purpose platforms. Integrated GPUs, residing on the same die with the CPU on general purpose platforms, offering many advanced features such as on-chip interconnect CPU-GPU communication, and sharing physical/virtual memory, become a promising additional compute resource to further accelerate the OvS process. In this paper, we design and implement an inline GPU assisted OvS architecture, via offloading the expensive tuple space search to GPU and balancing switching processing between CPU and GPU. We evaluated the performance on an Intel® Xeon® processor of the E3-1575M v5 product family (code-name Skylake) with an integrated GT4e GPU. The results show that our proposed architecture improved the OvS throughput by 3x, compared to the optimized CPU-only OvS-DPDK implementation.

[1]  Luigi Rizzo,et al.  netmap: A Novel Framework for Fast Packet I/O , 2012, USENIX ATC.

[2]  T. V. Lakshman,et al.  Multilayer Packet Classification With Graphics Processing Units , 2016, IEEE/ACM Transactions on Networking.

[3]  Xin Wang,et al.  Achieving O(1) IP lookup on GPU-based software routers , 2010, SIGCOMM '10.

[4]  Robert Ricci,et al.  Fast and flexible: Parallel packet processing with GPUs and click , 2013, Architectures for Networking and Communications Systems.

[5]  Timo Aila,et al.  Understanding the efficiency of ray traversal on GPUs , 2009, High Performance Graphics.

[6]  Jeff A. Stuart,et al.  A study of Persistent Threads style GPU programming for GPGPU workloads , 2012, 2012 Innovative Parallel Computing (InPar).

[7]  Sangjin Han,et al.  PacketShader: a GPU-accelerated software router , 2010, SIGCOMM '10.

[8]  Luigi Rizzo,et al.  Transparent acceleration of software packet forwarding using netmap , 2012, 2012 Proceedings IEEE INFOCOM.

[9]  Martín Casado,et al.  The Design and Implementation of Open vSwitch , 2015, NSDI.

[10]  Kang Kang,et al.  Scalable packet classification via GPU metaprogramming , 2011, 2011 Design, Automation & Test in Europe.

[11]  Ren Wang,et al.  Exploiting integrated GPUs for network packet processing workloads , 2016, 2016 IEEE NetSoft Conference and Workshops (NetSoft).

[12]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[13]  HyunYong Lee,et al.  Approaches for improving tuple space search-based table lookup , 2015, 2015 International Conference on Information and Communication Technology Convergence (ICTC).

[14]  Sungryoul Lee,et al.  Kargus: a highly-scalable software-based intrusion detection system , 2012, CCS.

[15]  Dong Zhou,et al.  Scalable, high performance ethernet forwarding with CuckooSwitch , 2013, CoNEXT.

[16]  Dong Zhou,et al.  Raising the Bar for Using GPUs in Software Packet Processing , 2015, NSDI.

[17]  Nick McKeown,et al.  Routing lookups in hardware at memory access speeds , 1998, Proceedings. IEEE INFOCOM '98, the Conference on Computer Communications. Seventeenth Annual Joint Conference of the IEEE Computer and Communications Societies. Gateway to the 21st Century (Cat. No.98.

[18]  Dong Zhou,et al.  Scaling Up Clustered Network Appliances with ScaleBricks , 2015, SIGCOMM.

[19]  Dafang Zhang,et al.  GAMT: A fast and scalable IP lookup engine for GPU-based software routers , 2013, Architectures for Networking and Communications Systems.

[20]  Venkatachary Srinivasan,et al.  Packet classification using tuple space search , 1999, SIGCOMM '99.

[21]  KyoungSoo Park,et al.  APUNet: Revitalizing GPU as Packet Processing Accelerator , 2017, NSDI.

[22]  Seungyeop Han,et al.  SSLShader: Cheap SSL Acceleration with Commodity Processors , 2011, NSDI.