String Searching Engine for Virus Scanning

A memory-efficient hardware string searching engine for antivirus applications is presented. The proposed QSV method is based on quick sampling of the input stream against fixed-length pattern prefixes, and on-demand verification of variable-length pattern suffixes. Patterns handled by the QSV method are required to have at least 16 bytes, and possess distinct 16-byte prefixes. The latter requirement can be fulfilled by a preprocessing procedure. The search engine uses the pipelined Aho-Corasick (P-AC) architecture developed by the first author to process 4 to 15-byte short patterns and a small number of exception cases. Our design was evaluated using the ClamAV virus database having 82,888 strings with a total size that exceeds 8 Mbyte. In terms of byte count, 99.3 percent of the pattern set is handled by the QSV method and 0.7 percent of the pattern set is handled by P-AC. A pattern with distinct 16-byte prefix only occupies up to three lookup table entries in QSV. The overall memory cost of our system is about 1.4 Mbyte, i.e., 1.4 bit per character of the ClamAV pattern set. The proposed method is memory-based, hence, updates to the pattern set can be accommodated by modifying the contents of the lookup tables without reconfiguring the hardware circuits.

[1]  Dionisios N. Pnevmatikatos,et al.  Pre-decoded CAMs for efficient and high-speed NIDS pattern matching , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[2]  Wei Zhang,et al.  A Memory Efficient Multiple Pattern Matching Architecture for Network Security , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[3]  Stefano Giordano,et al.  Divide and discriminate: algorithm for deterministic and fast hash lookups , 2009, ANCS '09.

[4]  Wei Lin,et al.  Pipelined Architecture for Multi-String Matching , 2008, IEEE Computer Architecture Letters.

[5]  Guy Lemieux,et al.  PERG: A scalable FPGA-based pattern-matching engine with consolidated Bloomier filters , 2008, 2008 International Conference on Field-Programmable Technology.

[6]  Jan van Lunteren,et al.  High-Performance Pattern-Matching for Intrusion Detection , 2006, INFOCOM.

[7]  T. V. Lakshman,et al.  Gigabit rate packet pattern-matching using TCAM , 2004, Proceedings of the 12th IEEE International Conference on Network Protocols, 2004. ICNP 2004..

[8]  Dionisios N. Pnevmatikatos,et al.  Hashing + memory = low cost, exact pattern matching , 2005, International Conference on Field Programmable Logic and Applications, 2005..

[9]  T. V. Lakshman,et al.  Variable-Stride Multi-Pattern Matching For Scalable Deep Packet Inspection , 2009, IEEE INFOCOM 2009.

[10]  Viktor K. Prasanna,et al.  Automatic Synthesis of Efficient Intrusion Detection Systems on FPGAs , 2006, IEEE Trans. Dependable Secur. Comput..

[11]  Derek Chi-Wai Pao A NFA-based programmable regular expression match engine , 2009, ANCS '09.

[12]  Timothy Sherwood,et al.  Bit-split string-matching engines for intrusion detection and prevention , 2006, TACO.

[13]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[14]  John W. Lockwood,et al.  Deep packet inspection using parallel bloom filters , 2004, IEEE Micro.

[15]  Vijay Kumar,et al.  High Speed Pattern Matching for Network IDS/IPS , 2006, Proceedings of the 2006 IEEE International Conference on Network Protocols.

[16]  Jan van Lunteren Searching very large routing tables in wide embedded memory , 2001, GLOBECOM.

[17]  Viktor K. Prasanna,et al.  A computationally efficient engine for flexible intrusion detection , 2005, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[18]  George Varghese,et al.  Deterministic memory-efficient string matching algorithms for intrusion detection , 2004, IEEE INFOCOM 2004.

[19]  Bin Liu,et al.  A memory-efficient pipelined implementation of the aho-corasick string-matching algorithm , 2010, TACO.

[20]  Dionisios N. Pnevmatikatos,et al.  A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems , 2007, 2007 International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation.

[21]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[22]  Christopher R. Clark,et al.  Efficient Reconfigurable Logic Circuits for Matching Complex Network Intrusion Detection Patterns , 2003, FPL.

[23]  William H. Mangione-Smith,et al.  Fast reconfiguring deep packet filter for 1+ gigabit network , 2005, 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'05).

[24]  Stamatis Vassiliadis,et al.  Scalable Multigigabit Pattern Matching for Packet Inspection , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[25]  John W. Lockwood,et al.  Fast and Scalable Pattern Matching for Network Intrusion Detection Systems , 2006, IEEE Journal on Selected Areas in Communications.

[26]  송왕철,et al.  IDS(Intrusion Detection System) , 2000 .