Leveraging SIMD parallelism for accelerating network applications

Software packet processing frameworks act as critical components in modern network architecture, as their performance has a vital impact on the quality of the network services. Motivated by the increasing number and capability for advanced vector instructions in recent mainstream CPUs, this paper explores a new parallel processing design and implementation of data structures and algorithms that are frequently used for building network applications. In particular, we propose effective SIMD optimization techniques for the bloom filter and Open vSwitch megaflow cache. Our design reduces memory access latency via careful prefetching and a new design that meets the needs of fast data consuming instructions. Our evaluation shows performance improvements up to 162% in bloom filter and 48% in Open vSwitch compared to their scalar version.

[1]  Martín Casado,et al.  The Design and Implementation of Open vSwitch , 2015, NSDI.

[2]  Kenneth A. Ross,et al.  Implementing database operations using SIMD instructions , 2002, SIGMOD '02.

[3]  Hunter Scales,et al.  AltiVec Extension to PowerPC Accelerates Media Processing , 2000, IEEE Micro.

[4]  Sangjin Han,et al.  PacketShader: a GPU-accelerated software router , 2010, SIGCOMM '10.

[5]  Mahmood Ahmadi,et al.  Bloom filter applications in network security: A state-of-the-art survey , 2013, Comput. Networks.

[6]  Haoyu Song,et al.  Fast hash table lookup using extended bloom filter: an aid to network processing , 2005, SIGCOMM '05.

[7]  Kenneth A. Ross,et al.  Vectorized Bloom filters for advanced SIMD processors , 2014, DaMoN '14.

[8]  KyoungSoo Park,et al.  APUNet: Revitalizing GPU as Packet Processing Accelerator , 2017, NSDI.

[9]  Daniel Raumer,et al.  MoonGen: A Scriptable High-Speed Packet Generator , 2014, Internet Measurement Conference.

[10]  Jun Zhou,et al.  Use of SIMD Vector Operations to Accelerate Application Code Performance on Low-Powered ARM and Intel Platforms , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[11]  Alexander Zeier,et al.  SIMD-Scan: Ultra Fast in-Memory Table Scan using on-Chip Vector Processing Units , 2009, Proc. VLDB Endow..

[12]  Pradeep Dubey,et al.  Efficient implementation of sorting on multi-core SIMD CPU architecture , 2008, Proc. VLDB Endow..

[13]  Andreas Kipf,et al.  Make the most out of your SIMD investments: counter control flow divergence in compiled query pipelines , 2018, DaMoN.

[14]  Sue B. Moon,et al.  NBA (network balancing act): a high-performance packet processing framework for heterogeneous processors , 2015, EuroSys.

[15]  John W. Lockwood,et al.  Deep packet inspection using parallel bloom filters , 2004, IEEE Micro.

[16]  Margaret Martonosi,et al.  Reducing GPU offload latency via fine-grained CPU-GPU synchronization , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[17]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[18]  Seungyeop Han,et al.  SSLShader: Cheap SSL Acceleration with Commodity Processors , 2011, NSDI.

[19]  Gustavo Alonso,et al.  Multi-Core, Main-Memory Joins: Sort vs. Hash Revisited , 2013, Proc. VLDB Endow..

[20]  Sotiris Ioannidis,et al.  GASPP: A GPU-Accelerated Stateful Packet Processing Framework , 2014, USENIX Annual Technical Conference.

[21]  DharmapurikarSarang,et al.  Fast hash table lookup using extended bloom filter , 2005 .

[22]  Rodric M. Rabbah,et al.  Exploiting vector parallelism in software pipelined loops , 2005, 38th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'05).

[23]  Venkatachary Srinivasan,et al.  Packet classification using tuple space search , 1999, SIGCOMM '99.

[24]  Toshio Nakatani,et al.  AA-Sort: A New Parallel Sorting Algorithm for Multi-Core SIMD Processors , 2007, 16th International Conference on Parallel Architecture and Compilation Techniques (PACT 2007).

[25]  Kenneth A. Ross,et al.  Rethinking SIMD Vectorization for In-Memory Databases , 2015, SIGMOD Conference.

[26]  Dongsu Han,et al.  DFC: Accelerating String Pattern Matching for Network Applications , 2016, NSDI.

[27]  Dong Zhou,et al.  Raising the Bar for Using GPUs in Software Packet Processing , 2015, NSDI.