A Parallel BMH String Matching Algorithm Based on OpenMP

BMH(Boyer-Moore-Horspool) string matching algorithms have played an important role in the field of biological sequence alignment, text processing, spell checking, and computer virus signature matching. However, with the explosive growth of the data, the serial string matching algorithm is too slow to finish the matching task within an acceptable time. This paper proposed a parallel BMH string matching algorithm based on OpenMP. The parallelism of the BMH algorithm is first identified by theoretical analysis. Then, the sequential algorithm is designed as a parallel algorithm in a data-parallel form and the larger string matching task is divided into multiple sub-string matching tasks by partitioning strategy. Furthermore, using the thread-level parallel technique, each thread carries out string matching operations on different sub-string blocks in parallel. Compared with the serial BMH algorithm, the parallel BMH matching algorithm with 8 threads can achieve up to 3.53 times speedup. On the OpenMP platform, experimental results show that the proposed parallel algorithm can acquire a significant increase in speedup. Under the accuracy of ensuring the string matching, the optimal number of threads and the most optimal block segmentation scheme are obtained through experimental tests. It indicates that the OpenMP-based parallel BMH algorithm has excellent acceleration performance, and an advantageous application prospect in various field.

[1]  Boqin Feng,et al.  A thread partitioning approach for speculative multithreading , 2013, The Journal of Supercomputing.

[2]  Philip S. Yu,et al.  A Bi-layered Parallel Training Architecture for Large-Scale Convolutional Neural Networks , 2018, IEEE Transactions on Parallel and Distributed Systems.

[3]  Zhan Peng,et al.  A Fast Engine for Multi-String Pattern Matching , 2017, Int. J. Pattern Recognit. Artif. Intell..

[4]  Nhat-Phuong Tran,et al.  Multi-stream Parallel String Matching on Kepler Architecture , 2013, MUSIC.

[5]  Mireille Régnier,et al.  Analysis of Boyer-Moore-Horspool string-matching heuristic , 1997, Random Struct. Algorithms.

[6]  Jorma Tarhio,et al.  Technology beats algorithms (in exact string matching) , 2017, Softw. Pract. Exp..

[7]  Kenli Li,et al.  Performance Optimization Using Partitioned SpMV on GPUs and Multicore CPUs , 2015, IEEE Transactions on Computers.

[8]  K. G. Srinivasa,et al.  GPU Based N-Gram String Matching Algorithm with Score Table Approach for String Searching in Many Documents , 2017 .

[9]  Satyadhyan Chickerur,et al.  Parallelization of Protein Clustering Algorithm Using OpenMP , 2018 .

[10]  Daniel Sundmark,et al.  10 Years of research on debugging concurrent and multicore software: a systematic mapping study , 2016, Software Quality Journal.

[11]  Kenli Li,et al.  Parallel Implementation of MAFFT on CUDA-Enabled Graphics Hardware , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[12]  Chantana Phongpensri,et al.  Practical parallel string matching framework for RDF entailments with GPUs , 2018, Inf. Syst. Frontiers.

[13]  Nagaveni,et al.  Various String Matching Algorithms for DNA Sequences to Detect Breast Cancer using CUDA Processors , 2014 .

[14]  Mirza Baber Baig,et al.  Parallel String Matching for Urdu Language Text , 2018 .

[15]  Mohsen Guizani,et al.  A Spark-Based Parallel Fuzzy $c$ -Means Segmentation Algorithm for Agricultural Image Big Data , 2019, IEEE Access.

[16]  Robert S. Boyer,et al.  A fast string searching algorithm , 1977, CACM.

[17]  Kenli Li,et al.  A Parallel Random Forest Algorithm for Big Data in a Spark Cloud Computing Environment , 2017, IEEE Transactions on Parallel and Distributed Systems.

[18]  Syed Danish Ali,et al.  Improved Approximate Multiple Pattern String Matching using Consecutive Q Grams of Pattern , 2013 .

[19]  Guangqi Chen,et al.  Parallel computing of three-dimensional discontinuous deformation analysis based on OpenMP , 2019, Computers and Geotechnics.

[20]  Konstantinos G. Margaritis,et al.  String Matching on a Multicore GPU Using CUDA , 2009, 2009 13th Panhellenic Conference on Informatics.

[21]  Cheng-Hung Lin,et al.  Perfect Hashing Based Parallel Algorithms for Multiple String Matching on Graphic Processing Units , 2017, IEEE Transactions on Parallel and Distributed Systems.

[22]  Philip S. Yu,et al.  Parallel Protein Community Detection in Large-scale PPI Networks Based on Multi-source Learning , 2018, IEEE/ACM transactions on computational biology and bioinformatics.

[23]  Cheng-Hung Lin,et al.  A Novel Parallel Dual-Character String Matching Algorithm on Graphical Processing Units , 2017, ICA3PP.

[24]  R. Nigel Horspool,et al.  Practical fast searching in strings , 1980, Softw. Pract. Exp..

[25]  Shuqin Li,et al.  Toward Emotion-Aware Computing: A Loop Selection Approach Based on Machine Learning for Speculative Multithreading , 2017, IEEE Access.