PERG: A scalable FPGA-based pattern-matching engine with consolidated Bloomier filters

PERG is an FPGA application that accelerates the process of searching a stream of bytes against a large, fixed database of string patterns. The stream could be network, disk, or file traffic, while the pattern database may represent computer viruses, spam, keyword sequences, or watermarks. A full pattern, or rule, consists of a sequence of one or more segments separated by gaps. Each segment is an exact sequence of bytes, possibly 100s of bytes long. Each gap contains arbitrary bytes, but is a known length. PERG uses a pattern compiler to transform a database of these rules into a hardware implementation. To the authorspsila knowledge, this is the first pattern match engine hardware designed for large virus databases. It is also first among network intrusion detection systems (NIDS), which are similar in nature to PERG, to implement Bloomier filters. Like hash tables, Bloomier filters produce false positives due to aliasing, so all potential matches must be verified by exact matching. However, Bloomier filters are more powerful than their ancestral Bloom filters because they can identify the exact rule of a potential match. This enables two key advantages for PERG. First, it allows PERG to use a checksum to very efficiently reduce false positives. Second, exact matching with PERG filters is much faster than with Bloom filter systems because only one suspect pattern needs to be checked, not all patterns. To reduce memory requirements, PERG packs as many segments as possible into each Bloomier filter by consolidating several different segment lengths into the same filter unit. This is done by dividing each segment into two smaller but overlapping fragments of the same length. Dividing into non-overlapping fragments would create shorter fragments of uneven lengths, leading to higher false positives and differing lengths to consolidate later. Using the ClamAV antivirus database, PERG fits 80,282 patterns containing over 8,224,848 characters into a single modest FPGA chip with a small (4 MB) off-chip memory. It uses just 26 filter units, resulting in roughly 26x improved density (characters per memory bit) compared to the next-best NIDS pattern match engine which fits only 1/250th the characters. PERG can scan at roughly 200 MB/s and match the speed of most network or disk interfaces.

[1]  Surin Kittitornkun,et al.  Applying Cuckoo Hashing for FPGA-based Pattern Matching in NIDS/NIPS , 2007, 2007 International Conference on Field-Programmable Technology.

[2]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[3]  Justin Zobel,et al.  Performance in Practice of String Hashing Functions , 1997, DASFAA.

[4]  Dionisios N. Pnevmatikatos,et al.  Hashing + memory = low cost, exact pattern matching , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[5]  Jan van Lunteren,et al.  High-Performance Pattern-Matching for Intrusion Detection , 2006, INFOCOM.

[6]  T. V. Lakshman,et al.  Gigabit rate packet pattern-matching using TCAM , 2004, Proceedings of the 12th IEEE International Conference on Network Protocols, 2004. ICNP 2004..

[7]  Viktor K. Prasanna,et al.  Fast Regular Expression Matching Using FPGAs , 2001, The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01).

[8]  John W. Lockwood,et al.  Fast and Scalable Pattern Matching for Network Intrusion Detection Systems , 2006, IEEE Journal on Selected Areas in Communications.

[9]  Rasmus Pagh,et al.  Cuckoo Hashing , 2001, Encyclopedia of Algorithms.

[10]  Xin Zhou,et al.  MRSI: A Fast Pattern Matching Algorithm for Anti-virus Applications , 2008, Seventh International Conference on Networking (icn 2008).

[11]  John W. Lockwood,et al.  Deep packet inspection using parallel bloom filters , 2004, IEEE Micro.

[12]  George Varghese,et al.  Deterministic memory-efficient string matching algorithms for intrusion detection , 2004, IEEE INFOCOM 2004.

[13]  Srihari Cadambi,et al.  Chisel: A Storage-efficient, Collision-free Hash-based Network Processing Architecture , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[14]  Timothy Sherwood,et al.  A high throughput string matching architecture for intrusion detection and prevention , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[15]  Stamatis Vassiliadis,et al.  A reconfigurable perfect-hashing scheme for packet inspection , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[16]  Bernard Chazelle,et al.  The Bloomier filter: an efficient data structure for static support lookup tables , 2004, SODA '04.

[17]  David R. Kaeli,et al.  Characterizing antivirus workload execution , 2005, CARN.

[18]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[19]  Timothy Sherwood,et al.  A High Throughput String Matching Architecture for Intrusion Detection and Prevention , 2005, ISCA 2005.

[20]  William H. Mangione-Smith,et al.  Fast reconfiguring deep packet filter for 1+ gigabit network , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).