Highly parallel bitmap-based regular expression matching for text analytics

Text analytics has become increasingly important in the past few years because of the substantial growth in the amount of research, business, and government needs. An efficient text analytics system is likely to require high-powered regular expression matching (REGEX), as REGEX operations dominate the whole execution time. Some approaches have exploited the parallelism of graphic processing units (GPUs) and field-programmable logic arrays (FPGAs) to boost REGEX's performance. Nevertheless, those approaches still used finite-state automaton to detect the given patterns while automation structure is naturally inadequate for parallel processing. In this paper, we propose a completely different hardware architecture of REGEX that employs a bitmap index instead of the finite-state automaton. Internal logic gates/registers and embedded memory of FPGA are used to construct the query processing units and a bitmap index, respectively. The experimental results on an Intel Arria V FPGA prove that our REGEX is fully operational at 100 MHz and can process a 64-character query inside a 64-KB text data within 43.76 μs. The throughput achieved, therefore, reaches 11.98 Gbps.

[1]  Simon Fong,et al.  Text Analytics for Predicting Question Acceptance Rates , 2015, IT Professional.

[2]  Katsumi Inoue,et al.  An efficient FPGA-based database processor for fast database analytics , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[3]  Xenofontas A. Dimitropoulos,et al.  Indexing million of packets per second using GPUs , 2013, Internet Measurement Conference.

[4]  Cong-Kha Pham,et al.  A Scalable High-Performance Priority Encoder Using 1D-Array to 2D-Array Conversion , 2017, IEEE Transactions on Circuits and Systems II: Express Briefs.

[5]  Frederick Reiss,et al.  SystemT: a system for declarative information extraction , 2009, SGMD.

[6]  Kanak Agarwal,et al.  A high-speed and large-scale dictionary matching engine for Information Extraction systems , 2013, 2013 IEEE 24th International Conference on Application-Specific Systems, Architectures and Processors.

[7]  Frederick Reiss,et al.  Hardware-accelerated regular expression matching for high-throughput text analytics , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[8]  Norio Yamagaki,et al.  High-speed regular expression matching engine using multi-character NFA , 2008, 2008 International Conference on Field Programmable Logic and Applications.

[9]  Cong-Kha Pham,et al.  An FPGA approach for high-performance multi-match priority encoder , 2016, IEICE Electron. Express.