Revisiting State Blow-Up: Automatically Building Augmented-FA While Preserving Functional Equivalence

Regular expression matching, a central task in deep packet inspection and other networking applications, has been traditionally implemented through finite automata. Thanks to their limited per-character processing and memory bandwidth requirements, deterministic finite automata (DFA) are a natural choice for memory-based implementations. In the presence of large datasets of complex patterns, however, DFA suffer from the well-known state explosion problem. Specifically, state explosion can take place during DFA generation when the considered patterns contain bounded and unbounded repetitions of wildcards or large character sets. Several alternative FA representations have been proposed to address this problem. However, these proposals all suffer from one or more of the following problems: some can avoid state explosion only on datasets of limited size and complexity; some have prohibitive worst-case memory bandwidth requirements or processing time; and some can only guarantee functional equivalence for restricted classes of regular expressions and require the user to manually filter out unsupported patterns. In this work we propose JFA, a finite automation that uses state variables to avoid state explosion, and is functionally equivalent to the corresponding DFA. Functional equivalence is guaranteed by construction without requiring user intervention. We also provide optimization techniques to both limit the amount of state variables required and provide a lower bound for the JFA traversal time.

[1]  Somesh Jha,et al.  Deflating the big bang: fast and scalable deep packet inspection with extended finite automata , 2008, SIGCOMM '08.

[2]  Somesh Jha,et al.  XFA: Faster Signature Matching with Extended Automata , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[3]  Xiaodong Yu,et al.  GPU acceleration of regular expression matching for large datasets: exploring the implementation space , 2013, CF '13.

[4]  Srihari Cadambi,et al.  Memory-Efficient Regular Expression Search Using State Merging , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[5]  Eric Torng,et al.  Bit weaving: A non-prefix approach to compressing packet classifiers in TCAMs , 2009, 2009 17th IEEE International Conference on Network Protocols.

[6]  Patrick Crowley,et al.  A workload for evaluating deep packet inspection architectures , 2008, 2008 IEEE International Symposium on Workload Characterization.

[7]  Min Chen,et al.  Chain-Based DFA Deflation for Fast and Scalable Regular Expression Matching Using TCAM , 2011, 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems.

[8]  Patrick Crowley,et al.  Efficient regular expression evaluation: theory to practice , 2008, ANCS '08.

[9]  Viktor K. Prasanna,et al.  Fast Regular Expression Matching Using FPGAs , 2001, The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01).

[10]  Patrick Crowley,et al.  A hybrid finite automaton for practical deep packet inspection , 2007, CoNEXT '07.

[11]  Stefano Giordano,et al.  An improved DFA for fast regular expression matching , 2008, CCRV.

[12]  George Varghese,et al.  Curing regular expressions matching algorithms from insomnia, amnesia, and acalculia , 2007, ANCS '07.

[13]  Patrick Crowley,et al.  An improved algorithm to accelerate regular expression evaluation , 2007, ANCS '07.

[14]  Alexander L. Wolf,et al.  A routing scheme for content-based networking , 2004, IEEE INFOCOM 2004.

[15]  T. V. Lakshman,et al.  Gigabit rate packet pattern-matching using TCAM , 2004, Proceedings of the 12th IEEE International Conference on Network Protocols, 2004. ICNP 2004..

[16]  T. V. Lakshman,et al.  Fast and memory-efficient regular expression matching for deep packet inspection , 2006, 2006 Symposium on Architecture For Networking And Communications Systems.

[17]  Christopher R. Clark,et al.  Modeling the data-dependent performance of pattern-matching architectures , 2006, FPGA '06.

[18]  Ron K. Cytron,et al.  A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching , 2006, ISCA 2006.

[19]  Laxmi N. Bhuyan,et al.  Compiling PCRE to FPGA for accelerating SNORT IDS , 2007, ANCS '07.

[20]  Hao Wang,et al.  MIN-MAX: A Counter-Based Algorithm for Regular Expression Matching , 2013, IEEE Transactions on Parallel and Distributed Systems.

[21]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM 2006.

[22]  Eric Torng,et al.  Fast Regular Expression Matching Using Small TCAMs for Network Intrusion Detection and Prevention Systems , 2010, USENIX Security Symposium.

[23]  Eric Torng,et al.  Bypassing Space Explosion in High-Speed Regular Expression Matching , 2014, IEEE/ACM Transactions on Networking.

[24]  Viktor K. Prasanna,et al.  FEACAN: Front-end acceleration for content-aware network processing , 2011, 2011 Proceedings IEEE INFOCOM.

[25]  Ming Yang,et al.  GPU-based NFA implementation for memory efficient high speed regular expression matching , 2012, PPoPP '12.

[26]  Patrick Crowley,et al.  Extending finite automata to efficiently match Perl-compatible regular expressions , 2008, CoNEXT '08.

[27]  Niccolo Cascarano,et al.  iNFAnt: NFA pattern matching on GPGPU devices , 2010, CCRV.