GPU-to-GPU and Host-to-Host Multipattern String Matching on a GPU

We develop GPU adaptations of the Aho-Corasick and multipattern Boyer-Moore string matching algorithms for the two cases GPU-to-GPU (input to the algorithms is initially in GPU memory and the output is left in GPU memory) and host-to-host (input and output are in the memory of the host CPU). For the GPU-to-GPU case, we consider several refinements to a base GPU implementation and measure the performance gain from each refinement. For the host-to-host case, we analyze two strategies to communicate between the host and the GPU and show that one is optimal with respect to runtime while the other requires less device memory. This analysis is done for GPUs with one I/O channel to the host as well as those with 2. Experiments conducted on an NVIDIA Tesla GT200 GPU that has 240 cores running off of a Xeon 2.8 GHz quad-core host CPU show that, for the GPU-to-GPU case, our Aho-Corasick GPU adaptation achieves a speedup between 8.5 and 9.5 relative to a single-thread CPU implementation and between 2.4 and 3.2 relative to the best multithreaded implementation. For the host-to-host case, the GPU AC code achieves a speedup of 3.1 relative to a single-threaded CPU implementation. However, the GPU is unable to deliver any speedup relative to the best multithreaded code running on the quad-core host. In fact, the measured speedups for the latter case ranged between 0.74 and 0.83. Early versions of our multipattern Boyer-Moore adaptations ran 7 to 10 percent slower than corresponding versions of the AC adaptations and we did not refine the multipattern Boyer-Moore codes further.

[1]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[2]  Sotiris Ioannidis,et al.  Regular Expression Matching on Graphics Hardware for Intrusion Detection , 2009, RAID.

[3]  Wojciech Rytter,et al.  Text Algorithms , 1994 .

[4]  Gonzalo Navarro,et al.  Average complexity of exact and approximate multiple string matching , 2004, Theor. Comput. Sci..

[5]  Wojciech Plandowski,et al.  Speeding up two string-matching algorithms , 2005, Algorithmica.

[6]  Sartaj Sahni,et al.  Fast in-Place File Carving for Digital Forensics , 2010, e-Forensics.

[7]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[8]  Andrew Chi-Chih Yao,et al.  The Complexity of Pattern Matching for a Random String , 1977, SIAM J. Comput..

[9]  Ricardo A. Baeza-Yates,et al.  Improved string searching , 1989, Softw. Pract. Exp..

[10]  Karthikeyan Sankaralingam,et al.  Evaluating GPUs for network packet signature matching , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[11]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[12]  Sartaj Sahni,et al.  Hypercube-to-host sorting , 2004, The Journal of Supercomputing.

[13]  Zvi Galil On improving the worst case running time of the Boyer-Moore string matching algorithm , 1979, CACM.

[14]  Sartaj Sahni,et al.  Host-to hypercube sorting , 1989 .

[15]  Sartaj Sahni,et al.  Highly compressed multi-pattern string matching on the cell broadband engine , 2011, 2011 IEEE Symposium on Computers and Communications (ISCC).

[16]  Nen-Fu Huang,et al.  A GPU-Based Multiple-Pattern Matching Algorithm for Network Intrusion Detection Systems , 2008, 22nd International Conference on Advanced Information Networking and Applications - Workshops (aina workshops 2008).

[17]  Kevin Skadron,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..

[18]  Sartaj Sahni,et al.  A balanced bin sort for hypercube multicomputers , 2004, The Journal of Supercomputing.

[19]  Simone Secchi,et al.  Experiences with String Matching on the Fermi Architecture , 2011, ARCS.

[20]  George Varghese,et al.  Applying Fast String Matching to Intrusion Detection , 2001 .

[21]  Sartaj Sahni Scheduling Master-Slave Multiprocessor Systems , 1995, Euro-Par.

[22]  Golden G. Richard,et al.  Massive threading: Using GPUs to increase the performance of digital forensics tools , 2007, Digit. Investig..

[23]  Zvi Galil,et al.  On improving the worst case running time of the Boyer-Moore string matching algorithm , 1978, CACM.

[24]  SkadronKevin,et al.  A performance study of general-purpose applications on graphics processors using CUDA , 2008 .

[25]  Golden G. Richard,et al.  Scalpel: A Frugal, High Performance File Carver , 2005, DFRWS.

[26]  Fabrizio Petrini,et al.  Accelerating Real-Time String Searching with Multicore Processors , 2008, Computer.

[27]  Fabrizio Petrini,et al.  Peak-Performance DFA-based String Matching on the Cell Processor , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[28]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[29]  Udi Manber,et al.  A FAST ALGORITHM FOR MULTI-PATTERN SEARCHING , 1999 .

[30]  R. Nigel Horspool,et al.  Practical fast searching in strings , 1980, Softw. Pract. Exp..

[31]  N. Memon,et al.  The evolution of file carving , 2009, IEEE Signal Processing Magazine.

[32]  Sartaj Sahni,et al.  Multipattern string matching on a GPU , 2011, 2011 IEEE Symposium on Computers and Communications (ISCC).

[33]  Carla E. Brodley,et al.  Offloading IDS Computation to the GPU , 2006, 2006 22nd Annual Computer Security Applications Conference (ACSAC'06).

[34]  Gaston H. Gonnet,et al.  A new approach to text searching , 1989, SIGIR '89.

[35]  Beate Commentz-Walter,et al.  A String Matching Algorithm Fast on the Average , 1979, ICALP.

[36]  Jyuo-Min Shyu,et al.  Accelerating String Matching Using Multi-Threaded Algorithm on GPU , 2010, 2010 IEEE Global Telecommunications Conference GLOBECOM 2010.

[37]  Wojciech Plandowski,et al.  Fast Practical Multi-Pattern Matching , 1999, Inf. Process. Lett..