A high-throughput DPI engine on GPU via algorithm/implementation co-optimization

The Graphics Processing Unit (GPU) is a promising platform to implement Deep Packet Inspection (DPI) due to the GPU's rich parallelism and programmability for high performance and frequent pattern update requirements. However, it is a great challenge to achieve a high performance implementation due to the GPU's performance sensitivity to algorithm and implementation issues such as memory overhead, thread divergence, and large lookup table sizes. In this paper, we propose algorithm and implementation co-optimization techniques that achieve high performance by reducing required memory, removing thread divergence, optimizing memory access patterns, and optimizing for multithreading. To lower the implementation cost, a GPU performance model is developed to detect the bottlenecks and provide design direction for the GPU kernel. Based on these optimization techniques, a prototype implementation of DPI at 150?Gb/s is achieved on a single NVIDIA K20 GPU. Identify GPU programming challenges for high throughput DPI implementation.Develop an analytical GPU performance model to explore the design space.Present algorithm and implementation co-optimization techniques.Prototype a deep packet inspection engine onto GPU with up to 150?Gb/s throughput.

[1]  Martin Lilleeng Sætra,et al.  Graphics processing unit (GPU) programming strategies and trends in GPU computing , 2013, J. Parallel Distributed Comput..

[2]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[3]  Andreas Moshovos,et al.  Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).

[4]  Yong Tang,et al.  Gregex: GPU Based High Speed Regular Expression Matching Engine , 2011, 2011 Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing.

[5]  Sotiris Ioannidis,et al.  GrAVity: A Massively Parallel Antivirus Engine , 2010, RAID.

[6]  Gonzalo Navarro,et al.  Compact DFA Representation for Fast Regular Expression Search , 2001, Algorithm Engineering.

[7]  Lucas Vespa,et al.  Deterministic finite automata characterization and optimization for scalable pattern matching , 2011, TACO.

[8]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[9]  Lucas Vespa,et al.  MS-DFA: Multiple-Stride Pattern Matching for Scalable Deep Packet Inspection , 2011, Comput. J..

[10]  George Varghese,et al.  Deterministic memory-efficient string matching algorithms for intrusion detection , 2004, IEEE INFOCOM 2004.

[11]  Patrick Crowley,et al.  A-DFA: A Time- and Space-Efficient DFA Compression Algorithm for Fast Regular Expression Evaluation , 2013, TACO.

[12]  Tsern-Huei Lee,et al.  A Pattern-Matching Scheme With High Throughput Performance and Low Memory Requirement , 2013, IEEE/ACM Transactions on Networking.

[13]  Udi Manber,et al.  A FAST ALGORITHM FOR MULTI-PATTERN SEARCHING , 1999 .

[14]  Stefano Giordano,et al.  An improved DFA for fast regular expression matching , 2008, CCRV.

[15]  Sotiris Ioannidis,et al.  Gnort: High Performance Network Intrusion Detection Using Graphics Processors , 2008, RAID.

[16]  Jason Maassen,et al.  Performance Models for CPU-GPU Data Transfers , 2014, 2014 14th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[17]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[18]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[19]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM 2006.

[20]  Ming Yang,et al.  GPU-based NFA implementation for memory efficient high speed regular expression matching , 2012, PPoPP '12.

[21]  Xinxin Mei,et al.  Benchmarking the Memory Hierarchy of Modern GPUs , 2014, NPC.

[22]  John A. Chandy,et al.  FPGA based network intrusion detection using content addressable memories , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[23]  Eric Torng,et al.  Fast Regular Expression Matching Using Small TCAM , 2014, IEEE/ACM Transactions on Networking.

[24]  Lucas Vespa,et al.  GPEP: Graphics Processing Enhanced Pattern-Matching for High-Performance Deep Packet Inspection , 2011, 2011 International Conference on Internet of Things and 4th International Conference on Cyber, Physical and Social Computing.

[25]  William Gropp,et al.  An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.

[26]  Viktor K. Prasanna,et al.  Robust and Scalable String Pattern Matching for Deep Packet Inspection on Multicore Processors , 2013, IEEE Transactions on Parallel and Distributed Systems.

[27]  Anat Bremler-Barr,et al.  CompactDFA: Scalable Pattern Matching Using Longest Prefix Match Solutions , 2014, IEEE/ACM Transactions on Networking.

[28]  T. V. Lakshman,et al.  Variable-Stride Multi-Pattern Matching For Scalable Deep Packet Inspection , 2009, IEEE INFOCOM 2009.

[29]  Srihari Cadambi,et al.  Memory-Efficient Regular Expression Search Using State Merging , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[30]  Lucas Vespa,et al.  P3FSM: Portable Predictive Pattern Matching Finite State Machine , 2009, 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors.

[31]  Ron K. Cytron,et al.  A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[32]  Sangjin Han,et al.  PacketShader: a GPU-accelerated software router , 2010, SIGCOMM '10.

[33]  Jiayuan Meng,et al.  Improving GPU Performance Prediction with Data Transfer Modeling , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[34]  Cui Yong,et al.  An improved Wu-Manber multiple patterns matching algorithm , 2006, 2006 IEEE International Performance Computing and Communications Conference.

[35]  Karthikeyan Sankaralingam,et al.  Evaluating GPUs for network packet signature matching , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[36]  Niccolo Cascarano,et al.  iNFAnt: NFA pattern matching on GPGPU devices , 2010, CCRV.