SigMatch: Fast and Scalable Multi-Pattern Matching

Multi-pattern matching involves matching a data item against a large database of "signature" patterns. Existing algorithms for multi-pattern matching do not scale well as the size of the signature database increases. In this paper, we present sigMatch -- a fast, versatile, and scalable technique for multi-pattern signature matching. At its heart, sigMatch organizes the signature database into a (processor) cache-efficient q-gram index structure, called the sigTree. The sigTree groups patterns based on common sub-patterns, such that signatures that don't match can be quickly eliminated from the matching process. The sigTree also uses parallel Bloom filters and a technique to reduce imbalances across groups, for improved performance. Using extensive empirical evaluation across three diverse domains, we show that sigMatch often outperforms existing methods by an order of magnitude or more.

[1]  송왕철,et al.  IDS(Intrusion Detection System) , 2000 .

[2]  T. V. Lakshman,et al.  Fast and memory-efficient regular expression matching for deep packet inspection , 2006, 2006 Symposium on Architecture For Networking And Communications Systems.

[3]  H BloomBurton Space/time trade-offs in hash coding with allowable errors , 1970 .

[4]  Udi Manber,et al.  A FAST ALGORITHM FOR MULTI-PATTERN SEARCHING , 1999 .

[5]  Xin Zhou,et al.  MRSI: A Fast Pattern Matching Algorithm for Anti-virus Applications , 2008, Seventh International Conference on Networking (icn 2008).

[6]  Graham A. Stephen String Searching Algorithms , 1994, Lecture Notes Series on Computing.

[7]  Erez Zadok,et al.  Avfs: An On-Access Anti-Virus File System , 2004, USENIX Security Symposium.

[8]  W. Marsden I and J , 2012 .

[9]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[10]  Junghoo Cho,et al.  A fast regular expression indexing engine , 2002, Proceedings 18th International Conference on Data Engineering.

[11]  John W. Lockwood,et al.  Deep packet inspection using parallel bloom filters , 2004, IEEE Micro.

[12]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[13]  Haoyu Song,et al.  Multi-pattern signature matching for hardware network intrusion detection systems , 2005, GLOBECOM '05. IEEE Global Telecommunications Conference, 2005..

[14]  Surajit Chaudhuri,et al.  An efficient filter for approximate membership checking , 2008, SIGMOD Conference.

[15]  Rajeev Rastogi,et al.  Scalable regular expression matching on data streams , 2008, SIGMOD Conference.

[16]  Jiaheng Lu,et al.  Efficient Merging and Filtering Algorithms for Approximate String Searches , 2008, 2008 IEEE 24th International Conference on Data Engineering.

[17]  Raghu Ramakrishnan,et al.  DBLife: A Community Information Management Platform for the Database Research Community (Demo) , 2007, CIDR.

[18]  Somesh Jha,et al.  Deflating the big bang: fast and scalable deep packet inspection with extended finite automata , 2008, SIGCOMM '08.

[19]  Beate Commentz-Walter,et al.  A String Matching Algorithm Fast on the Average , 1979, ICALP.

[20]  Bin Wang,et al.  VGRAM: Improving Performance of Approximate Queries on String Collections Using Variable-Length Grams , 2007, VLDB.

[21]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[22]  TarhioJorma,et al.  Multipattern string matching with q-grams , 2007 .

[23]  Sachin Agarwal,et al.  Efficient PDA Synchronization , 2003, IEEE Trans. Mob. Comput..

[24]  Divesh Srivastava,et al.  Flexible String Matching Against Large Databases in Practice , 2004, VLDB.

[25]  Pei Cao,et al.  Hash-AV: fast virus signature scanning by cache-resident filters , 2005, GLOBECOM.

[26]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.