Small-ruleset regular expression matching on GPGPUs: quantitative performance analysis and optimization

We explore the intersection between an emerging class of architectures and a prominent workload: GPGPUs (General-Purpose Graphics Processing Units) and regular expression matching, respectively. It is a challenging task because this workload -- with its irregular, non-coalesceable memory access patterns -- is very different from the regular, numerical workloads that run efficiently on GPGPUs. Small-ruleset expression matching is a fundamental building block for search engines, business analytics, natural language processing, XML processing, compiler front-ends and network security. Despite the abundant power that GPGPUs promise, little work has investigated their potential and limitations with this workload, and how to best utilize the memory classes that GPGPUs offer. We describe an optimization path of the kernel of flex (the popular, open-source regular expression scanner generator) to four nVidia GPGPU models, with decisions based on quantitative micro-benchmarking, performance counters and simulator runs. Our solution achieves a tokenization throughput that exceeds the results obtained by the GPGPU-based string matching solutions presented so far, and compares well with solutions obtained on any architecture.

[1]  Sotiris Ioannidis,et al.  Gnort: High Performance Network Intrusion Detection Using Graphics Processors , 2008, RAID.

[2]  Francesco Iorio Fast Pattern Matching on the Cell Broadband Engine TM , 2008 .

[3]  Robert D. Cameron,et al.  Architectural support for SWAR text processing with parallel bit streams: the inductive doubling principle , 2009, ASPLOS.

[4]  Hyesoon Kim,et al.  An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.

[5]  Noah,et al.  Performance Analysis of XML APIs , 2006 .

[6]  Gregory F. Russell,et al.  High-performance regular expression scanning on the Cell/B.E. processor , 2009, ICS '09.

[7]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[8]  Abraham Heifets,et al.  XML screamer: an integrated approach to high performance XML parsing, validation and deserialization , 2006, WWW '06.

[9]  Neelam Goyal,et al.  Signature Matching in Network Processing using SIMD / GPU Architectures , 2007 .

[10]  XML parsing: a threat to database performance , 2003, CIKM '03.

[11]  Karthikeyan Sankaralingam,et al.  Evaluating GPUs for network packet signature matching , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[12]  Edward T. Grochowski,et al.  Larrabee: A many-Core x86 architecture for visual computing , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[13]  Gordon W. Braudaway,et al.  Workload characterization and optimization of high-performance text indexing on the Cell Broadband Engine™ (Cell/B.E.) , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[14]  Jan van Lunteren,et al.  High-Performance Pattern-Matching for Intrusion Detection , 2006, INFOCOM.

[15]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[16]  James Demmel,et al.  Benchmarking GPUs to tune dense linear algebra , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[17]  Kevin Skadron,et al.  Scalable parallel programming , 2008, 2008 IEEE Hot Chips 20 Symposium (HCS).

[18]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[19]  Fabrizio Petrini,et al.  Tools for Very Fast Regular Expression Matching , 2010, Computer.