Bypassing Space Explosion in Regular Expression Matching for Network Intrusion Detection and Prevention Systems

Network intrusion detection and prevention systems commonly use regular expression (RE) signatures to represent individual security threats. While the corresponding DFA for any one RE is typically small, the DFA that corresponds to the entire set of REs is usually too large to be constructed or deployed. To address this issue, a variety of alternative automata implementations that compress the size of the final automaton have been proposed such as XFA and DFA. The resulting final automata are typically much smaller than the corresponding DFA. However, the previously proposed automata construction algorithms do suffer from some drawbacks. First, most employ a “Union then Minimize” framework where the automata for each RE are first joined before minimization occurs. This leads to an expensive NFA to DFA subset construction on a relatively large NFA. Second, most construct the corresponding large DFA as an intermediate step. In some cases, this DFA is so large that the final automaton cannot be constructed even though the final automaton is small enough to be deployed. In this paper, we propose a “Minimize then Union” framework for constructing compact alternative automata focusing on the DFA. We show that we can construct an almost optimal final DFA with small intermediate parsers. The key to our approach is a space and time efficient routine for merging two compact DFA into a compact DFA. In our experiments, our algorithm runs up to 302 times faster and uses 1390 times less memory than previous algorithms. For example, we are able to construct a DFA with over 80,000,000 states using only 1GB of main memory in only 77 minutes.

[1]  Anat Bremler-Barr,et al.  CompactDFA: Generic State Machine Compression for Scalable Pattern Matching , 2010, 2010 Proceedings IEEE INFOCOM.

[2]  Christopher R. Clark,et al.  Efficient Reconfigurable Logic Circuits for Matching Complex Network Intrusion Detection Patterns , 2003, FPL.

[3]  Viktor K. Prasanna,et al.  Fast Regular Expression Matching Using FPGAs , 2001, The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01).

[4]  George Varghese,et al.  Deterministic memory-efficient string matching algorithms for intrusion detection , 2004, IEEE INFOCOM 2004.

[5]  Eric Torng,et al.  Fast Regular Expression Matching Using Small TCAMs for Network Intrusion Detection and Prevention Systems , 2010, USENIX Security Symposium.

[6]  Somesh Jha,et al.  Deflating the big bang: fast and scalable deep packet inspection with extended finite automata , 2008, SIGCOMM '08.

[7]  John W. Lockwood,et al.  Implementation of a content-scanning module for an Internet firewall , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..

[8]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM.

[9]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[10]  Vijay Kumar,et al.  High Speed Pattern Matching for Network IDS/IPS , 2006, Proceedings of the 2006 IEEE International Conference on Network Protocols.

[11]  Liu Yang,et al.  Fast, memory-efficient regular expression matching with NFA-OBDDs , 2011, Comput. Networks.

[12]  Vern Paxson,et al.  Enhancing byte-level network intrusion detection signatures with context , 2003, CCS '03.

[13]  Patrick Crowley,et al.  An improved algorithm to accelerate regular expression evaluation , 2007, ANCS '07.

[14]  Jonathan S. Turner,et al.  Advanced algorithms for fast and scalable deep packet inspection , 2006, 2006 Symposium on Architecture For Networking And Communications Systems.

[15]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[16]  Patrick Crowley,et al.  Efficient regular expression evaluation: theory to practice , 2008, ANCS '08.

[17]  Dionisios N. Pnevmatikatos,et al.  Fast, Large-Scale String Match for a 10Gbps FPGA-Based Network Intrusion Detection System , 2003, FPL.

[18]  Randy Smith,et al.  Efficient signature matching with multiple alphabet compression tables , 2008, SecureComm.

[19]  Patrick Crowley,et al.  Extending finite automata to efficiently match Perl-compatible regular expressions , 2008, CoNEXT '08.

[20]  Timothy Sherwood,et al.  A high throughput string matching architecture for intrusion detection and prevention , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[21]  Laxmi N. Bhuyan,et al.  Compiling PCRE to FPGA for accelerating SNORT IDS , 2007, ANCS '07.

[22]  Srihari Cadambi,et al.  Memory-Efficient Regular Expression Search Using State Merging , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[23]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[24]  George Varghese,et al.  Curing regular expressions matching algorithms from insomnia, amnesia, and acalculia , 2007, ANCS '07.

[25]  Li Guo,et al.  Accelerating DFA Construction by Hierarchical Merging , 2011, 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications.

[26]  Somesh Jha,et al.  XFA: Faster Signature Matching with Extended Automata , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[27]  P L Shepherd,et al.  Theory to practice: the elusive marriage. , 1983 .

[28]  Dionisios N. Pnevmatikatos,et al.  Pre-decoded CAMs for efficient and high-speed NIDS pattern matching , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[29]  Christopher R. Clark,et al.  Scalable pattern matching for high speed networks , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[30]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[31]  Patrick Crowley,et al.  A hybrid finite automaton for practical deep packet inspection , 2007, CoNEXT '07.

[32]  Ron K. Cytron,et al.  A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[33]  Youngseok Lee,et al.  A multi-gigabit rate deep packet inspection algorithm using TCAM , 2005, GLOBECOM '05. IEEE Global Telecommunications Conference, 2005..

[34]  T. V. Lakshman,et al.  Fast and memory-efficient regular expression matching for deep packet inspection , 2006, 2006 Symposium on Architecture For Networking And Communications Systems.

[35]  T. V. Lakshman,et al.  Gigabit rate packet pattern-matching using TCAM , 2004, Proceedings of the 12th IEEE International Conference on Network Protocols, 2004. ICNP 2004..