A Hybrid Algorithm of Backward Hashing and Automaton Tracking for Virus Scanning

Virus scanning involves computationally intensive string matching against a large number of signatures of different characteristics. Matching a variety of signatures challenges the selection of matching algorithms, as each approach has better performance than others for different signature characteristics. We propose a hybrid approach that partitions the signatures into long and short ones in the open-source ClamAV for virus scanning. An algorithm enhanced from the Wu-Manber algorithm, namely the Backward Hashing algorithm, is responsible for only long patterns to lengthen the average skip distance, while the Aho-Corasick algorithm scans for only short patterns to reduce the automaton sizes. The former utilizes the bad-block heuristic to exploit long shift distance and reduce the verification frequency, so it is much faster than the original WM implementation in ClamAV. The latter increases the AC performance by around 50 percent due to better cache locality. We also rank the factors to indicate their importance for the string matching performance.

[1]  Wei Zhang,et al.  MDH: A High Speed Multi-phase Dynamic Hash String Matching Algorithm for Large-Scale Pattern Set , 2007, ICICS.

[2]  George Varghese,et al.  Applying Fast String Matching to Intrusion Detection , 2001 .

[3]  Jun Li,et al.  Recursive Shift Indexing : A Fast Multi-Pattern String Matching Algorithm , 2006 .

[4]  Jan van Lunteren,et al.  High-Performance Pattern-Matching for Intrusion Detection , 2006, INFOCOM.

[5]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[6]  H. Jonathan Chao,et al.  TriBiCa: Trie Bitmap Content Analyzer for High-Speed Network Intrusion Detection , 2007, IEEE INFOCOM 2007 - 26th IEEE International Conference on Computer Communications.

[7]  Wojciech Rytter,et al.  Jewels of stringology , 2002 .

[8]  Gonzalo Navarro,et al.  Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences , 2002 .

[9]  Jorma Tarhio,et al.  Tuning String Matching for Huge Pattern Sets , 2003, CPM.

[10]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[11]  Zvi Galil On improving the worst case running time of the Boyer-Moore string matching algorithm , 1979, CACM.

[12]  Pei Cao,et al.  Hash-AV: fast virus signature scanning by cache-resident filters , 2005, GLOBECOM.

[13]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[14]  Nen-Fu Huang,et al.  A fast pattern-match engine for network processor-based network intrusion detection system , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[15]  Timothy Sherwood,et al.  Architectures for Bit-Split String Scanning in Intrusion Detection , 2006, IEEE Micro.

[16]  T. V. Lakshman,et al.  Fast and memory-efficient regular expression matching for deep packet inspection , 2006, 2006 Symposium on Architecture For Networking And Communications Systems.

[17]  M. Norton Optimizing Pattern Matching for Intrusion Detection , 2004 .

[18]  Ron K. Cytron,et al.  A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).

[19]  Erez Zadok,et al.  Avfs: An On-Access Anti-Virus File System , 2004, USENIX Security Symposium.

[20]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[21]  Udi Manber,et al.  A FAST ALGORITHM FOR MULTI-PATTERN SEARCHING , 1999 .

[22]  Tzi-cker Chiueh,et al.  A Study of the Packer Problem and Its Solutions , 2008, RAID.

[23]  John W. Lockwood,et al.  Fast and scalable pattern matching for content filtering , 2005, 2005 Symposium on Architectures for Networking and Communications Systems (ANCS).

[24]  Tsern-Huei Lee,et al.  Realizing a Sub-Linear Time String-Matching Algorithm With a Hardware Accelerator Using Bloom Filters , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[25]  Tsern-Huei Lee,et al.  Using String Matching for Deep Packet Inspection , 2008, Computer.

[26]  Kei Hiraki,et al.  Over 10Gbps String Matching Mechanism for Multi-stream Packet Scanning Systems , 2004, FPL.

[27]  rey O. Kephart,et al.  Automatic Extraction of Computer Virus SignaturesJe , 2006 .

[28]  Ravendra Singh,et al.  A FAST STRING MATCHING ALGORITHM , 2011 .

[29]  George Varghese,et al.  Deterministic memory-efficient string matching algorithms for intrusion detection , 2004, IEEE INFOCOM 2004.