Reorganized and Compact DFA for Efficient Regular Expression Matching

Regular expression matching has become a critical yet challenging technique in content-aware network processing, such as application identification and deep inspection. To meet the requirement for processing heavy network traffic at line rate, Deterministic Finite Automata (DFA) is widely used to accelerate regular expression matching at the expense of large memory usage. In this paper, we propose a DFA-based algorithm named RCDFA (Reorganized and Compact DFA), which dramatically reduces the memory usage while maintaining fast and deterministic lookup speed. Based on the dissection of real-life DFA tables, we observe that almost every state has multiple similar states, i.e. they share identical next-state transitions for most input characters. However, these similar states often distribute at nonadjacent positions in the original DFA table. RCDFA aims at reorganizing all similar states into adjacent entries, so that identical transitions become consecutive along the state dimension, then compresses the reorganized DFA table utilizing bitmap technique. Coupled with mapping along the character dimension, RCDFA is not only efficient in DFA compression, but also effective for hardware implementation. Experiment results show, RCDFA has superior performance in terms of high processing speed, low memory usage and short preprocessing time. RCDFA consistently achieves over 95% compression ratio for existing real-life rule sets. Implemented in a single Xilinx Virtex-6 FPGA platform, RCDFA matching engine achieved 12Gbps throughput.

[1]  George Varghese,et al.  Curing regular expressions matching algorithms from insomnia, amnesia, and acalculia , 2007, ANCS '07.

[2]  Roger Larsen,et al.  BRO - an Intrusion Detection System , 2011 .

[3]  Jeffrey D. Ullman,et al.  Introduction to Automata Theory, Languages and Computation , 1979 .

[4]  Jonathan S. Turner,et al.  Advanced algorithms for fast and scalable deep packet inspection , 2006, 2006 Symposium on Architecture For Networking And Communications Systems.

[5]  Ron K. Cytron,et al.  A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[6]  T. V. Lakshman,et al.  Fast and memory-efficient regular expression matching for deep packet inspection , 2006, 2006 Symposium on Architecture For Networking And Communications Systems.

[7]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM.

[8]  Viktor K. Prasanna,et al.  Fast Regular Expression Matching Using FPGAs , 2001, The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01).

[9]  Somesh Jha,et al.  XFA: Faster Signature Matching with Extended Automata , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[10]  George Varghese,et al.  Deterministic memory-efficient string matching algorithms for intrusion detection , 2004, IEEE INFOCOM 2004.

[11]  Vern Paxson,et al.  Enhancing byte-level network intrusion detection signatures with context , 2003, CCS '03.

[12]  Patrick Crowley,et al.  An improved algorithm to accelerate regular expression evaluation , 2007, ANCS '07.