An efficient regular expressions compression algorithm from a new perspective

Deep packet inspection plays a increasingly important role in network security devices and applications, which use more regular expressions to depict patterns. DFA engine is usually used as a classical representation for regular expressions to perform pattern matching, because it only need O(1) time to process one input character. However, DFAs of regular expression sets require large amount of memory, which limits the practical application of regular expressions in high-speed networks. Some compression algorithms have been proposed to address this issue in recent literatures. In this paper, we reconsider this problem from a new perspective, namely observing the characteristic of transition distribution inside each state, which is different from previous algorithms that observe transition characteristic among states. Furthermore, we introduce a new compression algorithm which can reduce 95% memory usage of DFA stably without significant impact on matching speed. Moreover, our work is orthogonal to previous compression algorithms, such as D2FA, δFA. Our experiment results show that applying our work to them will have several times memory reduction, and matching speed of up to dozens of times comparing with original δFA in software implementation.

[1]  Ken Thompson,et al.  Programming Techniques: Regular expression search algorithm , 1968, Commun. ACM.

[2]  Alfred V. Aho,et al.  Compilers: Principles, Techniques, and Tools , 1986, Addison-Wesley series in computer science / World student series edition.

[3]  Eugene W. Myers,et al.  A Four Russians algorithm for regular expression pattern matching , 1992, JACM.

[4]  Vern Paxson,et al.  Bro: a system for detecting network intruders in real-time , 1998, Comput. Networks.

[5]  Martin Roesch,et al.  Snort - Lightweight Intrusion Detection for Networks , 1999 .

[6]  Gonzalo Navarro,et al.  Compact DFA Representation for Fast Regular Expression Search , 2001, Algorithm Engineering.

[7]  Ron K. Cytron,et al.  A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[8]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM.

[9]  T. V. Lakshman,et al.  Fast and memory-efficient regular expression matching for deep packet inspection , 2006, 2006 Symposium on Architecture For Networking And Communications Systems.

[10]  Patrick Crowley,et al.  An improved algorithm to accelerate regular expression evaluation , 2007, ANCS '07.

[11]  Nen-Fu Huang,et al.  A Novel Algorithm and Architecture for High Speed Pattern Matching in Resource-Limited Silicon Solution , 2007, 2007 IEEE International Conference on Communications.

[12]  Wei Zhang,et al.  A Memory Efficient Multiple Pattern Matching Architecture for Network Security , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[13]  Patrick Crowley,et al.  A workload for evaluating deep packet inspection architectures , 2008, 2008 IEEE International Symposium on Workload Characterization.

[14]  Somesh Jha,et al.  Deflating the big bang: fast and scalable deep packet inspection with extended finite automata , 2008, SIGCOMM '08.

[15]  Randy Smith,et al.  Efficient signature matching with multiple alphabet compression tables , 2008, SecureComm.

[16]  Somesh Jha,et al.  XFA: Faster Signature Matching with Extended Automata , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[17]  Stefano Giordano,et al.  An improved DFA for fast regular expression matching , 2008, CCRV.

[18]  Patrick Crowley,et al.  Efficient regular expression evaluation: theory to practice , 2008, ANCS '08.

[19]  T. V. Lakshman,et al.  Variable-Stride Multi-Pattern Matching For Scalable Deep Packet Inspection , 2009, IEEE INFOCOM 2009.