High Throughput Parallel Implementation of Aho-Corasick Algorithm on a GPU

Pattern matching is an important operation in various applications such as computer and network security, bioinformatics, image processing, among many others. Aho-Corasick (AC) algorithm is a multiple patterns matching algorithm commonly used for such applications. In order to meet the highly demanding performance requirements imposed on these applications, achieving high performance for AC algorithm is crucial. In this paper, we present a high performance parallel implementation of AC algorithm on a Graphic Processing Unit (GPU) which efficiently utilizes the high degree of on-chip parallelism and the memory hierarchy of the GPU so that the aggregate performance (or throughput) of the GPU can be maximized. For this purpose our approach carefully places and caches the input text data and the reference pattern data used for pattern matching in the on-chip shared memories and the texture caches of the GPU. Furthermore, it efficiently schedules the off-chip global memory loads and the shared memory stores in order to minimize the overheads in loading the input data to the shared memories and also to minimize the shared memory bank conflicts. The proposed approach leads to a significant cut-down of the effective memory access latencies and leads to impressive performance improvements. Experimental results on Nvidia GeForce GTX 285 GPU show that our approach delivers up to 127Gbps throughput performance and up to 222-times speedup compared with a serial version running on 2.2Ghz Core2Duo Intel processor.

[1]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.

[2]  Sartaj Sahni,et al.  Highly compressed multi-pattern string matching on the cell broadband engine , 2011, 2011 IEEE Symposium on Computers and Communications (ISCC).

[3]  Sartaj Sahni,et al.  Multipattern string matching on a GPU , 2011, 2011 IEEE Symposium on Computers and Communications (ISCC).

[4]  Carla E. Brodley,et al.  Offloading IDS Computation to the GPU , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[5]  Cole Trapnell,et al.  Fast Exact String Matching on the GPU , 2011 .

[6]  Peter Kulchyski and , 2015 .

[7]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[8]  Fabrizio Petrini,et al.  Accelerating Real-Time String Searching with Multicore Processors , 2008, Computer.

[9]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[10]  Fabrizio Petrini,et al.  Peak-Performance DFA-based String Matching on the Cell Processor , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[11]  Antonino Tumeo,et al.  Accelerating DNA analysis applications on GPU clusters , 2010, 2010 IEEE 8th Symposium on Application Specific Processors (SASP).

[12]  Jyuo-Min Shyu,et al.  Accelerating String Matching Using Multi-Threaded Algorithm on GPU , 2010, 2010 IEEE Global Telecommunications Conference GLOBECOM 2010.

[13]  Karthikeyan Sankaralingam,et al.  Evaluating GPUs for network packet signature matching , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[14]  Sotiris Ioannidis,et al.  Gnort: High Performance Network Intrusion Detection Using Graphics Processors , 2008, RAID.

[15]  Antonino Tumeo,et al.  Efficient pattern matching on GPUs for intrusion detection systems , 2010, CF '10.

[16]  M. Norton Optimizing Pattern Matching for Intrusion Detection , 2004 .