High-speed string searching against large dictionaries on the Cell/B.E. Processor

Our digital universe is growing, creating exploding amounts of data which need to be searched, filtered and protected. String searching is at the core of the tools we use to curb this explosion, such as search engines, network intrusion detection systems, spam filters, and anti-virus programs. But as communication speed grows, our capability to perform string searching in real-time seems to fall behind. Multi-core architectures promise enough computational power to cope with the incoming challenge, but it is still unclear which algorithms and programming models to use to unleash this power. We have parallelized a popular string searching algorithm, Aho-Corasick, on the IBM Cell/B.E. processor, with the goal of performing exact string matching against large dictionaries. In this article we propose a novel approach to fully exploit the DMA-based communication mechanisms of the Cell/B.E. to provide an unprecedented level of aggregate performance with irregular access patterns. We have discovered that memory congestion plays a crucial role in determining the performance of this algorithm. We discuss three aspects of congestion: memory pressure, layout issues and hot spots, and we present a collection of algorithmic solutions to alleviate these problems and achieve quasi-optimal performance. The implementation of our algorithm provides a worst- case throughput of 2.5 Gbps, and a typical throughput between 3.3 and 4.4 Gbps, measured on realistic scenarios with a two-processor Cell/B.E. system.

[1]  Udi Manber,et al.  Fast text searching: allowing errors , 1992, CACM.

[2]  John W. Lockwood,et al.  Reprogrammable network packet processing on the field programmable port extender (FPX) , 2001, FPGA '01.

[3]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[4]  Dionisios N. Pnevmatikatos,et al.  Pre-decoded CAMs for efficient and high-speed NIDS pattern matching , 2004, 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[5]  Burton H. Bloom,et al.  Space/time trade-offs in hash coding with allowable errors , 1970, CACM.

[6]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[7]  Evangelos P. Markatos,et al.  Performance analysis of content matching intrusion detection systems , 2004, 2004 International Symposium on Applications and the Internet. Proceedings..

[8]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[9]  John W. Lockwood,et al.  Deep packet inspection using parallel bloom filters , 2004, IEEE Micro.

[10]  Walid A. Najjar,et al.  Automatic Compilation Framework for Bloom Filter Based Intrusion Detection , 2006, ARC.

[11]  Fabrizio Petrini,et al.  Peak-Performance DFA-based String Matching on the Cell Processor , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.

[12]  Beate Commentz-Walter,et al.  A String Matching Algorithm Fast on the Average , 1979, ICALP.

[13]  John W. Lockwood,et al.  Implementation of a content-scanning module for an Internet firewall , 2003, 11th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2003. FCCM 2003..

[14]  Andrei Broder,et al.  Network Applications of Bloom Filters: A Survey , 2004, Internet Math..

[15]  Udi Manber,et al.  Fast Text Searching With Errors , 2005 .

[16]  Richard M. Karp,et al.  Efficient Randomized Pattern-Matching Algorithms , 1987, IBM J. Res. Dev..

[17]  M. V. Ramakrishna,et al.  Efficient Hardware Hashing Functions for High Performance Computers , 1997, IEEE Trans. Computers.

[18]  Bruce W. Watson,et al.  The performance of single-keyword and multiple-keyword pattern matching algorithms , 1994 .

[19]  John A. Chandy,et al.  A CAM-based keyword match processor architecture , 2006, Microelectron. J..

[20]  英哉 岩崎 20世紀の名著名論:D. E. Knuth J. H. Morris V. R. Pratt : Fast pattern matching in Strings , 2004 .