MEMORY-Based Hardware Architectures to Detect ClamAV Virus Signatures with Restricted Regular Expression Features

We aim to implement a single-chip hardware detection engine for virus scanning. Our study is based on the ClamAV virus database, which contains 88.9K strings and 9.6K extended hex-signatures with restricted regular expression (regex) features. We have previously presented cost-effective hardware architectures to detect the 88.9K strings and 3.2K regex patterns that are composed of multiple string segments. In this paper, we shall present hardware architectures to detect the remaining 6.4K regex patterns. Our method is based on the information reduction approach. We transform the byte-oriented matching problem to a token-based matching problem. A regex pattern contains one or more segments, and a segment may be subdivided into multiple non-trivial tokens. In general, a token is associated with one or a few regexes only. The input byte-stream is converted into a token-stream using dedicated hardware units, where the number of tokens is much less than the number of bytes. The token-stream is processed by a NFA-based aggregation unit to determine if any segment can be found. Detected segments are further processed by a scoreboard to determine if any multi-segment pattern can be found. For proof-of-concept, our method is implemented on a Virtex-6 FPGA which consumes 1.84 MB on-chip memory.

[1]  Tsutomu Sasao,et al.  A virus scanning engine using a parallel finite-input memory machine and MPUs , 2009, 2009 International Conference on Field Programmable Logic and Applications.

[2]  Patrick Crowley,et al.  A hybrid finite automaton for practical deep packet inspection , 2007, CoNEXT '07.

[3]  Jan van Lunteren,et al.  Hardware-accelerated regular expression matching at multiple tens of Gb/s , 2012, 2012 Proceedings IEEE INFOCOM.

[4]  Viktor K. Prasanna,et al.  Space-time tradeoff in regular expression matching with semi-deterministic finite automata , 2011, 2011 Proceedings IEEE INFOCOM.

[5]  Viktor K. Prasanna,et al.  High-Performance and Compact Architecture for Regular Expression Matching on FPGA , 2012, IEEE Transactions on Computers.

[6]  John W. Lockwood,et al.  Application of Hardware Accelerated Extensible Network Nodes for Internet Worm and Virus Protection , 2003, IWAN.

[7]  Kai Wang,et al.  Practical regular expression matching free of scalability and performance barriers , 2014, Comput. Commun..

[8]  Wei Zhang,et al.  A multiple simple regular expression matching architecture and coprocessor for deep packet inspection , 2008, 2008 13th Asia-Pacific Computer Systems Architecture Conference.

[9]  H. Jonathan Chao,et al.  Scalable Lookahead Regular Expression Detection System for Deep Packet Inspection , 2012, IEEE/ACM Transactions on Networking.

[10]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM 2006.

[11]  Xing Wang,et al.  Multi-Stride String Searching for High-Speed Content Inspection , 2012, Comput. J..

[12]  Stamatis Vassiliadis,et al.  Scalable Multigigabit Pattern Matching for Packet Inspection , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[13]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[14]  Xing Wang,et al.  String Searching Engine for Virus Scanning , 2011, IEEE Transactions on Computers.

[15]  Guy Lemieux,et al.  PERG-Rx: a hardware pattern-matching engine supporting limited regular expressions , 2009, FPGA '09.

[16]  Xing Wang,et al.  Hardware Accelerator to Detect Multi-Segment Virus Patterns , 2015, Comput. J..

[17]  Stamatis Vassiliadis,et al.  Regular Expression Matching in Reconfigurable Hardware , 2008, J. Signal Process. Syst..

[18]  Jie Wu,et al.  Fast Deep Packet Inspection with a Dual Finite Automata , 2013, IEEE Transactions on Computers.

[19]  Ray C. C. Cheung,et al.  A memory-based NFA regular expression match engine for signature-based intrusion detection , 2013, Comput. Commun..

[20]  Yuan-Cheng Lai,et al.  Hardware-Software Codesign for High-Speed Signature-based Virus Scanning , 2009, IEEE Micro.

[21]  T. V. Lakshman,et al.  Fast and memory-efficient regular expression matching for deep packet inspection , 2006, 2006 Symposium on Architecture For Networking And Communications Systems.

[22]  Derek Pao,et al.  Design of a near-minimal dynamic perfect hash function on embedded device , 2013, 2013 15th International Conference on Advanced Communications Technology (ICACT).

[23]  Peng Zhou,et al.  Efficient packet classification using TCAMs , 2006, Comput. Networks.

[24]  Bin Liu,et al.  A memory-efficient pipelined implementation of the aho-corasick string-matching algorithm , 2010, TACO.

[25]  Eric Torng,et al.  Bypassing Space Explosion in High-Speed Regular Expression Matching , 2014, IEEE/ACM Transactions on Networking.

[26]  Tsern-Huei Lee Generalized Aho-Corasick Algorithm for Signature Based Anti-Virus Applications , 2007, 2007 16th International Conference on Computer Communications and Networks.

[27]  Somesh Jha,et al.  Deflating the big bang: fast and scalable deep packet inspection with extended finite automata , 2008, SIGCOMM '08.

[28]  Eric Torng,et al.  Fast Regular Expression Matching Using Small TCAM , 2014, IEEE/ACM Transactions on Networking.

[29]  Timothy Sherwood,et al.  Bit-split string-matching engines for intrusion detection and prevention , 2006, TACO.

[30]  Li Guo,et al.  An efficient regular expressions compression algorithm from a new perspective , 2011, 2011 Proceedings IEEE INFOCOM.

[31]  George Varghese,et al.  Deterministic memory-efficient string matching algorithms for intrusion detection , 2004, IEEE INFOCOM 2004.

[32]  Somesh Jha,et al.  XFA: Faster Signature Matching with Extended Automata , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[33]  Wei Lin,et al.  Pipelined Architecture for Multi-String Matching , 2008, IEEE Computer Architecture Letters.

[34]  Ömer Egecioglu,et al.  Automata-Theoretic Analysis of Bit-Split Languages for Packet Scanning , 2008, CIAA.

[35]  Guy Lemieux,et al.  PERG: A scalable FPGA-based pattern-matching engine with consolidated Bloomier filters , 2008, 2008 International Conference on Field-Programmable Technology.

[36]  Jan van Lunteren,et al.  High-Performance Pattern-Matching for Intrusion Detection , 2006, INFOCOM.

[37]  Hiroshi Ishii,et al.  Memory-efficient signature matching for ClamAV on FPGA , 2014, 2014 IEEE Fifth International Conference on Communications and Electronics (ICCE).

[38]  George Varghese,et al.  Curing regular expressions matching algorithms from insomnia, amnesia, and acalculia , 2007, ANCS '07.