CompactDFA: Generic State Machine Compression for Scalable Pattern Matching

Pattern matching algorithms lie at the core of all contemporary Intrusion Detection Systems (IDS), making it intrinsic to reduce their speed and memory requirements. This paper focuses on the most popular class of pattern-matching algorithms, the Aho-Corasick--like algorithms, which are based on constructing and traversing a Deterministic Finite Automaton (DFA), representing the patterns. While this approach ensures deterministic time guarantees, modern IDSs need to deal with hundreds of patterns, thus requiring to store very large DFAs which usually do not fit in fast memory. This results in a major bottleneck on the throughput of the IDS, as well as its power consumption and cost. We propose a novel method to compress DFAs by observing that the name of the states is meaningless. While regular DFAs store separately each transition between two states, we use this degree of freedom and encode states in such a way that all transitions to a specific state can be represented by a single prefix that defines a set of current states. Our technique applies to a large class of automata, which can be categorized by simple properties. Then, the problem of pattern matching is reduced to the well-studied problem of Longest Prefix Matching (LPM) that can be solved either in TCAM, in commercially available IP-lookup chips, or in software. Specifically, we show that with a TCAM our scheme can reach a throughput of 10 Gbps with low power consumption.

[1]  Walid Dabbous,et al.  Survey and taxonomy of IP address lookup algorithms , 2001, IEEE Netw..

[2]  Fang Hao,et al.  IPv6 Lookups using Distributed and Load Balanced Bloom Filters for 100Gbps Core Router Line Cards , 2009, IEEE INFOCOM 2009.

[3]  Wei Lin,et al.  Pipelined Parallel AC-Based Approach for Multi-String Matching , 2008, 2008 14th IEEE International Conference on Parallel and Distributed Systems.

[4]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[5]  Wei Lin,et al.  Pipelined Architecture for Multi-String Matching , 2008, IEEE Computer Architecture Letters.

[6]  Timothy Sherwood,et al.  Architectures for Bit-Split String Scanning in Intrusion Detection , 2006, IEEE Micro.

[7]  Bin Liu,et al.  Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[8]  Jan van Lunteren,et al.  High-Performance Pattern-Matching for Intrusion Detection , 2006, INFOCOM.

[9]  Rina Panigrahy,et al.  Reducing TCAM power consumption and increasing throughput , 2002, Proceedings 10th Symposium on High Performance Interconnects.

[10]  Nen-Fu Huang,et al.  A fast pattern-match engine for network processor-based network intrusion detection system , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[11]  Wei Zhang,et al.  A Memory Efficient Multiple Pattern Matching Architecture for Network Security , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[12]  George Varghese,et al.  Deterministic memory-efficient string matching algorithms for intrusion detection , 2004, IEEE INFOCOM 2004.

[13]  Timothy Sherwood,et al.  Modeling TCAM power for next generation network devices , 2006, 2006 IEEE International Symposium on Performance Analysis of Systems and Software.

[14]  David E. Taylor Survey and taxonomy of packet classification techniques , 2005, CSUR.

[15]  T. V. Lakshman,et al.  SSA: a power and memory efficient scheme to multi-match packet classification , 2005, ANCS '05.

[16]  Francis Zane,et al.  Coolcams: power-efficient TCAMs for forwarding engines , 2003, IEEE INFOCOM 2003. Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies (IEEE Cat. No.03CH37428).

[17]  Y. Weinsberg,et al.  High performance string matching algorithm for a network intrusion prevention system (NIPS) , 2006, 2006 Workshop on High Performance Switching and Routing.

[18]  T. V. Lakshman,et al.  Gigabit rate packet pattern-matching using TCAM , 2004, Proceedings of the 12th IEEE International Conference on Network Protocols, 2004. ICNP 2004..