Multiple pattern matching for network security applications: Acceleration through vectorization

Abstract As both new network attacks emerge and network traffic increases in volume, the need to perform network traffic inspection at high rates is ever increasing. The core of many security applications that inspect network traffic (such as Network Intrusion Detection) is pattern matching. At the same time, pattern matching is a major performance bottleneck for those applications: indeed, it is shown to contribute to more than 70% of the total running time of Intrusion Detection Systems. Although numerous efficient approaches to this problem have been proposed on custom hardware, it is challenging for pattern matching algorithms to gain benefit from the advances in commodity hardware. This becomes even more relevant with the adoption of Network Function Virtualization, that moves network services, such as Network Intrusion Detection, to the cloud, where scaling on commodity hardware is key for performance. In this paper, we tackle the problem of pattern matching and show how to leverage the architecture features found in commodity platforms. We present efficient algorithmic designs that achieve good cache locality and make use of modern vectorization techniques to utilize data parallelism within each core. We first identify properties of pattern matching that make it fit for vectorization and show how to use them in the algorithmic design. Second, we build on an earlier, cache-aware algorithmic design and show how we apply cache-locality combined with SIMD gather instructions to pattern matching. Third, we complement our algorithms with an analytical model that predicts their performance and that can be used to easily evaluate alternative designs. We evaluate our algorithmic design with open data sets of real-world network traffic: Our results on two different platforms, Haswell and Xeon-Phi, show a speedup of 1.8x and 3.6x, respectively, over Direct Filter Classification (DFC), a recently proposed algorithm by Choi et al. for pattern matching exploiting cache locality, and a speedup of more than 2.3x over Aho–Corasick, a widely used algorithm in today’s Intrusion Detection Systems. Finally, we utilize highly parallel hardware platforms, evaluate the scalability of our algorithms and compare it to parallel implementations of DFC and Aho–Corasick, achieving processing throughput of up to 45Gbps and close to 2 times higher throughput than Aho–Corasick.

[1]  David Brumley,et al.  SplitScreen: Enabling efficient, distributed malware detection , 2010, Journal of Communications and Networks.

[2]  Somesh Jha,et al.  Deflating the big bang: fast and scalable deep packet inspection with extended finite automata , 2008, SIGCOMM '08.

[3]  Kenneth A. Ross,et al.  Rethinking SIMD Vectorization for In-Memory Databases , 2015, SIGMOD Conference.

[4]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[5]  Keith W. Ross,et al.  Computer networking - a top-down approach featuring the internet , 2000 .

[6]  Matteo Frigo,et al.  Cache-oblivious algorithms , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[7]  Magnus Almgren,et al.  CLort: High Throughput and Low Energy Network Intrusion Detection on IoT Devices with Embedded GPUs , 2018, NordSec.

[8]  Wolfram Schulte,et al.  Data-parallel finite-state machines , 2014, ASPLOS.

[9]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[10]  Peng Jiang,et al.  Combining SIMD and Many/Multi-core Parallelism for Finite State Machines with Enumerative Speculation , 2017, PPoPP.

[11]  Gérard Berry,et al.  From Regular Expressions to Deterministic Automata , 1986, Theor. Comput. Sci..

[12]  Konstantinos G. Margaritis,et al.  Multiple String Matching on a GPU using CUDAs , 2015, Scalable Comput. Pract. Exp..

[13]  Filip De Turck,et al.  Network Function Virtualization: State-of-the-Art and Research Challenges , 2015, IEEE Communications Surveys & Tutorials.

[14]  Dongsu Han,et al.  DFC: Accelerating String Pattern Matching for Network Applications , 2016, NSDI.

[15]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[16]  Marina Papatriantafilou,et al.  STRETCH: Scalable and Elastic Deterministic Streaming Analysis with Virtual Shared-Nothing Parallelism , 2019, DEBS.

[17]  Kenneth A. Ross,et al.  SIMD-accelerated regular expression matching , 2016, DaMoN '16.

[18]  Dionisios N. Pnevmatikatos,et al.  Pre-decoded CAMs for efficient and high-speed NIDS pattern matching , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[19]  Sotiris Ioannidis,et al.  Gnort: High Performance Network Intrusion Detection Using Graphics Processors , 2008, RAID.

[20]  Gerhard Wellein,et al.  Comparing the performance of different x86 SIMD instruction sets for a medical imaging application on modern multi- and manycore chips , 2014, WPMVP '14.

[21]  Harry Chang,et al.  Hyperscan: A Fast Multi-pattern Regex Matcher for Modern CPUs , 2019, NSDI.

[22]  Fabrizio Petrini,et al.  Peak-Performance DFA-based String Matching on the Cell Processor , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[23]  Robert C. Atkinson,et al.  A Highly-Efficient Memory-Compression Scheme for GPU-Accelerated Intrusion Detection Systems , 2014, SIN.

[24]  Sotiris Ioannidis,et al.  Efficient Software Packet Processing on Heterogeneous and Asymmetric Hardware Architectures , 2017, IEEE/ACM Transactions on Networking.

[25]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[26]  Min Chen,et al.  Software-Defined Network Function Virtualization: A Survey , 2015, IEEE Access.

[27]  Kurt Mehlhorn,et al.  On a Model of Virtual Address Translation , 2015, ACM J. Exp. Algorithmics.

[28]  Simone Faro,et al.  Fast Multiple String Matching Using Streaming SIMD Extensions Technology , 2012, SPIRE.

[29]  Magnus Almgren,et al.  Multiple Pattern Matching for Network Security Applications: Acceleration through Vectorization , 2017, 2017 46th International Conference on Parallel Processing (ICPP).

[30]  Cheng-Hung Lin,et al.  Accelerating Pattern Matching Using a Novel Parallel Algorithm on GPUs , 2013, IEEE Transactions on Computers.

[31]  Ali A. Ghorbani,et al.  Toward developing a systematic approach to generate benchmark datasets for intrusion detection , 2012, Comput. Secur..

[32]  Gustavo Alonso How Hardware Evolution is Driving Software Systems , 2019, DEBS.

[33]  Arian Maghazeh,et al.  Pattern matching in OpenCL: GPU vs CPU energy consumption on two mobile chipsets , 2014, IWOCL '14.

[34]  Kenneth A. Ross,et al.  Vectorized Bloom filters for advanced SIMD processors , 2014, DaMoN '14.

[35]  KyoungSoo Park,et al.  APUNet: Revitalizing GPU as Packet Processing Accelerator , 2017, NSDI.

[36]  Konstantinos G. Margaritis,et al.  String Matching on a Multicore GPU Using CUDA , 2009, 2009 13th Panhellenic Conference on Informatics.

[37]  Tyler Akidau Open Problems in Stream Processing: A Call To Action , 2019, DEBS.

[38]  P MarkatosEvangelos,et al.  Generating realistic workloads for network intrusion detection systems , 2004 .

[39]  Philip K. Chan,et al.  An Analysis of the 1999 DARPA/Lincoln Laboratory Evaluation Data for Network Anomaly Detection , 2003, RAID.

[40]  J.B.D. Cabrera,et al.  On the statistical distribution of processing times in network intrusion detection , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).

[41]  Evangelos P. Markatos,et al.  Generating realistic workloads for network intrusion detection systems , 2004, WOSP '04.

[42]  David G. Andersen,et al.  Exact pattern matching with feed-forward bloom filters , 2012, JEAL.

[43]  Sungryoul Lee,et al.  Kargus: a highly-scalable software-based intrusion detection system , 2012, CCS.

[44]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.