Hardware Accelerator to Detect Multi-Segment Virus Patterns

Multi-segment pattern is a common virus structure, and there are 2229 multi-segment patterns in the ClamAV virus database version 54. We observe that (i) the pattern set contains over 100 nondistinctive short segments, e.g. 2 bytes of zero; (ii) some of the 2-byte segments can appear many times in one or more patterns; (iii) some patterns contain a large number of 2-byte segments; (iv) many short segments are substrings/suffixes of other longer segments; and (v) adjacent segments may contain overlapping bytes. The aforementioned properties pose great difficulties to the conventional detection methods. Instead of viewing the virus signature as a byte sequence, we regard the pattern to be composed of a sequence of tokens, where each token corresponds to a segment. We transform the input byte stream into a token stream. The detection engine will then process the token stream to determine if any virus signatures can be found. Our detection method for the 2229 multi-segment patterns can be implemented on a field programmable gate array (FPGA) using 290KB on-chip memory. The device can operate at 170MHz and it can process 1 byte per cycle. The processing architecture is memory based. When the pattern set is updated, the FPGA need not be reconfigured.

[1]  Udi Manber,et al.  A FAST ALGORITHM FOR MULTI-PATTERN SEARCHING , 1999 .

[2]  Bin Liu,et al.  A memory-efficient pipelined implementation of the aho-corasick string-matching algorithm , 2010, TACO.

[3]  Xing Wang,et al.  String Searching Engine for Virus Scanning , 2011, IEEE Transactions on Computers.

[4]  Somesh Jha,et al.  XFA: Faster Signature Matching with Extended Automata , 2008, 2008 IEEE Symposium on Security and Privacy (sp 2008).

[5]  Sotiris Ioannidis,et al.  Regular Expression Matching on Graphics Hardware for Intrusion Detection , 2009, RAID.

[6]  Patrick Crowley,et al.  A hybrid finite automaton for practical deep packet inspection , 2007, CoNEXT '07.

[7]  Wei Lin,et al.  Pipelined Architecture for Multi-String Matching , 2008, IEEE Computer Architecture Letters.

[8]  Guy Lemieux,et al.  PERG: A scalable FPGA-based pattern-matching engine with consolidated Bloomier filters , 2008, 2008 International Conference on Field-Programmable Technology.

[9]  Jan van Lunteren,et al.  High-Performance Pattern-Matching for Intrusion Detection , 2006, INFOCOM.

[10]  Yuan-Cheng Lai,et al.  Hardware-Software Codesign for High-Speed Signature-based Virus Scanning , 2009, IEEE Micro.

[11]  Guy Lemieux,et al.  PERG-Rx: a hardware pattern-matching engine supporting limited regular expressions , 2009, FPGA '09.

[12]  Jan van Lunteren,et al.  Hardware-accelerated regular expression matching at multiple tens of Gb/s , 2012, 2012 Proceedings IEEE INFOCOM.

[13]  H. Jonathan Chao,et al.  Scalable Lookahead Regular Expression Detection System for Deep Packet Inspection , 2012, IEEE/ACM Transactions on Networking.

[14]  Patrick Crowley,et al.  Algorithms to accelerate multiple regular expressions matching for deep packet inspection , 2006, SIGCOMM 2006.

[15]  Stamatis Vassiliadis,et al.  Regular Expression Matching in Reconfigurable Hardware , 2008, J. Signal Process. Syst..

[16]  Derek Pao,et al.  Design of a near-minimal dynamic perfect hash function on embedded device , 2013, 2013 15th International Conference on Advanced Communications Technology (ICACT).

[17]  Peng Zhou,et al.  Efficient packet classification using TCAMs , 2006, Comput. Networks.

[18]  Charlie Johnson,et al.  IBM Power Edge of Network Processor: A Wire-Speed System on a Chip , 2011, IEEE Micro.

[19]  George Varghese,et al.  Curing regular expressions matching algorithms from insomnia, amnesia, and acalculia , 2007, ANCS '07.

[20]  Carla E. Brodley,et al.  Offloading IDS Computation to the GPU , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[21]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[22]  Diomidis Spinellis,et al.  Reliable identification of bounded-length viruses is NP-complete , 2003, IEEE Trans. Inf. Theory.

[23]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[24]  Li Guo,et al.  An efficient regular expressions compression algorithm from a new perspective , 2011, 2011 Proceedings IEEE INFOCOM.

[25]  Viktor K. Prasanna,et al.  Compact architecture for high-throughput regular expression matching on FPGA , 2008, ANCS '08.

[26]  Somesh Jha,et al.  Deflating the big bang: fast and scalable deep packet inspection with extended finite automata , 2008, SIGCOMM '08.

[27]  Eric Torng,et al.  Fast Regular Expression Matching Using Small TCAM , 2014, IEEE/ACM Transactions on Networking.

[28]  Sotiris Ioannidis,et al.  GrAVity: A Massively Parallel Antivirus Engine , 2010, RAID.

[29]  Ray C. C. Cheung,et al.  A memory-based NFA regular expression match engine for signature-based intrusion detection , 2013, Comput. Commun..

[30]  Viktor K. Prasanna,et al.  Space-time tradeoff in regular expression matching with semi-deterministic finite automata , 2011, 2011 Proceedings IEEE INFOCOM.

[31]  John W. Lockwood,et al.  Application of Hardware Accelerated Extensible Network Nodes for Internet Worm and Virus Protection , 2003, IWAN.

[32]  Xing Wang,et al.  Multi-Stride String Searching for High-Speed Content Inspection , 2012, Comput. J..

[33]  Tsern-Huei Lee Hardware Architecture for High-Performance Regular Expression Matching , 2009, IEEE Transactions on Computers.