A Parallel Algorithm of Multiple String Matching Based on Set-Partition in Multi-core Architecture

With the coming of the big data era, the data processing in large scale comes out with a new challenge. However, string matching still plays an important role in the network security and information retrieval fields, because of the large size of pattern set with the overhead of memory and access memory time. Improving the string matching algorithm to adapt to the large scale tasks is desirable and meaningful. In this paper, we present and implement a parallel algorithm of multiple string matching based on multi-core platform. In addition, this work focuses on the partition of pattern set by using genetic algorithm through the internal relation of the patterns to reduce the memory overhead and execution performance. Compared with the classical ones, our experiments on both high and low hit-rate data demonstrate that the performance of algorithm enhances about on average by 20%-40% in general. Besides, the proposed algorithm reduces the memory cost on average by 4%-20%.

[1]  Liu Sheng-hui Research and Realization of Intrusion Detection System’s Rule Base Based on CVE Characters , 2005 .

[2]  Gonzalo Navarro,et al.  Flexible Pattern Matching in Strings: Practical On-Line Search Algorithms for Texts and Biological Sequences , 2002 .

[3]  Viktor K. Prasanna,et al.  Memory-Efficient Pipelined Architecture for Large-Scale String Matching , 2009, 2009 17th IEEE Symposium on Field Programmable Custom Computing Machines.

[4]  T. V. Lakshman,et al.  Variable-Stride Multi-Pattern Matching For Scalable Deep Packet Inspection , 2009, IEEE INFOCOM 2009.

[5]  John E. Beasley,et al.  Constraint Handling in Genetic Algorithms: The Set Partitioning Problem , 1998, J. Heuristics.

[6]  Yanbing Liu,et al.  A Partition-Based Efficient Algorithm for Large Scale Multiple-Strings Matching , 2005, SPIRE.

[7]  Maurice Herlihy,et al.  The art of multiprocessor programming , 2020, PODC '06.

[8]  George Varghese,et al.  Deterministic memory-efficient string matching algorithms for intrusion detection , 2004, IEEE INFOCOM 2004.

[9]  Viktor K. Prasanna,et al.  Head-body partitioned string matching for Deep Packet Inspection with scalable and attack-resilient performance , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[10]  Sotiris Ioannidis,et al.  Gnort: High Performance Network Intrusion Detection Using Graphics Processors , 2008, RAID.

[11]  John W. Lockwood,et al.  Fast and Scalable Pattern Matching for Network Intrusion Detection Systems , 2006, IEEE Journal on Selected Areas in Communications.

[12]  Hyunjin Kim,et al.  A Pattern Group Partitioning for Parallel String Matching using a Pattern Grouping Metric , 2010, IEEE Communications Letters.

[13]  Timothy Sherwood,et al.  A High Throughput String Matching Architecture for Intrusion Detection and Prevention , 2005, ISCA 2005.

[14]  Hyejeong Hong,et al.  A memory-efficient heterogeneous parallel pattern matching scheme in deep packet inspection , 2010, IEICE Electron. Express.

[15]  Viktor K. Prasanna,et al.  Fast Regular Expression Matching Using FPGAs , 2001, The 9th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM'01).

[16]  Wei Zhang,et al.  A Memory Efficient Multiple Pattern Matching Architecture for Network Security , 2008, IEEE INFOCOM 2008 - The 27th Conference on Computer Communications.

[17]  Yanbing Liu,et al.  Revisiting Multiple Pattern Matching Algorithms for Multi-Core Architecture , 2011, Journal of Computer Science and Technology.

[18]  Fabrizio Petrini,et al.  Accelerating Real-Time String Searching with Multicore Processors , 2008, Computer.

[19]  Marco Dorigo,et al.  Distributed Optimization by Ant Colonies , 1992 .

[20]  Qiao Pei-li Research and Implementation of Network Intrusion Prevention System under the Linux Platform , 2009 .

[21]  Tsern-Huei Lee,et al.  Using String Matching for Deep Packet Inspection , 2008, Computer.

[22]  Jiahui Liu,et al.  An Efficient Parallel String Matching Algorithm Based on DFA , 2012, ISCTCS.

[23]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[24]  Viktor K. Prasanna,et al.  Performance of FPGA implementation of bit-split architecture for intrusion detection systems , 2006, Proceedings 20th IEEE International Parallel & Distributed Processing Symposium.

[25]  Jan van Lunteren,et al.  High-Performance Pattern-Matching for Intrusion Detection , 2006, INFOCOM.

[26]  Roshan G. Ragel,et al.  String matching with multicore CPUs: Performing better with the Aho-Corasick algorithm , 2013, 2013 IEEE 8th International Conference on Industrial and Information Systems.

[27]  Ioan Cristian Trelea,et al.  The particle swarm optimization algorithm: convergence analysis and parameter selection , 2003, Inf. Process. Lett..

[28]  Bin Liu,et al.  A Memory-Efficient Parallel String Matching Architecture for High-Speed Intrusion Detection , 2006, IEEE Journal on Selected Areas in Communications.

[29]  Yasuaki Ito,et al.  The Approximate String Matching on the Hierarchical Memory Machine, with Performance Evaluation , 2013, 2013 IEEE 7th International Symposium on Embedded Multicore Socs.