Parallelizing Exact and Approximate String Matching via Inclusive Scan on a GPU

In this study, to substantially improve the runtimes of exact and approximate string matching algorithms, we propose a tribrid parallel method for bit-parallel algorithms such as the Shift-Or and Wu-Manber algorithms. Our underlying idea is to interpret bit-parallel algorithms as inclusive-scan operations, which allow these bit-parallel algorithms to run efficiently on a graphics processing unit (GPU); we achieve this speed-up here because inclusive-scan operations not only eliminate duplicate searches between threads but also realize a GPU-friendly memory access pattern that maximizes memory read/write throughput. To realize our ideas, we first define two binary operators and then present a proof regarding the associativity of these operators, which is necessary for the parallelization of the inclusive-scan operations. Finally, we integrate the inclusive-scan scheme into a previous segmentation-based scheme to maximize search throughput, identifying the best tradeoff point between synchronization cost and duplicate work. Through our experiments, we compared our proposed method with previous segmentation-based methods and indexing-based sequence aligners. For online string matching, our proposed method performed 6.7-16.7 times faster than previous methods, achieving a search throughput of up to 1.88 terabits per second (Tbps) on a GeForce GTX TITAN X GPU. We therefore conclude that our proposed method is quite effective for decreasing the runtimes of online string matching of short patterns.

[1]  Jyuo-Min Shyu,et al.  Accelerating String Matching Using Multi-Threaded Algorithm on GPU , 2010, 2010 IEEE Global Telecommunications Conference GLOBECOM 2010.

[2]  Guy E. Blelloch,et al.  Prefix sums and their applications , 1990 .

[3]  Gonzalo Navarro,et al.  Faster Approximate String Matching , 1999, Algorithmica.

[4]  Antonino Tumeo,et al.  Accelerating DNA analysis applications on GPU clusters , 2010, 2010 IEEE 8th Symposium on Application Specific Processors (SASP).

[5]  G. Amdhal,et al.  Validity of the single processor approach to achieving large scale computing capabilities , 1967, AFIPS '67 (Spring).

[6]  Yongchao Liu,et al.  Long read alignment based on maximal exact match seeds , 2012, Bioinform..

[7]  Rajesh Prasad,et al.  A fast bit-parallel multi-patterns string matching algorithm for biological sequences , 2010 .

[8]  Thierry Lecroq,et al.  The exact online string matching problem: A review of the most recent results , 2013, CSUR.

[9]  Yongchao Liu,et al.  Bit-parallel approximate pattern matching: Kepler GPU versus Xeon Phi , 2016, Parallel Comput..

[10]  Mark J. Harris,et al.  Parallel Prefix Sum (Scan) with CUDA , 2011 .

[11]  Gaston H. Gonnet,et al.  A new approach to text searching , 1989, SIGIR '89.

[12]  Udi Manber,et al.  Fast text searching: allowing errors , 1992, CACM.

[13]  Keqin Li,et al.  Parallel Algorithms for Approximate String Matching with k Mismatches on CUDA , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.

[14]  Fumihiko Ino,et al.  A bit-parallel algorithm for searching multiple patterns with various lengths , 2015, J. Parallel Distributed Comput..

[15]  Donald E. Knuth,et al.  Fast Pattern Matching in Strings , 1977, SIAM J. Comput..

[16]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[17]  Ricardo Baeza-Yates,et al.  Efficient text searching , 1989 .

[18]  Eugene W. Myers,et al.  A fast bit-vector algorithm for approximate string matching based on dynamic programming , 1998, JACM.

[19]  LinCheng-Hung,et al.  Accelerating Pattern Matching Using a Novel Parallel Algorithm on GPUs , 2013 .

[20]  Guy E. Blelloch,et al.  Scans as Primitive Parallel Operations , 1989, ICPP.

[21]  Cheng Hung Lin,et al.  Hierarchical Parallelism of Bit-Parallel Algorithm for Approximate String Matching on GPUs , 2014, 2014 IEEE Symposium on Computer Applications and Communications.

[22]  Yun Xu,et al.  BitMapper: an efficient all-mapper based on bit-vector computing , 2015, BMC Bioinformatics.

[23]  Koji Nakano,et al.  An Optimal Implementation of the Approximate String Matching on the Hierarchical Memory Machine, with Performance Evaluation on the GPU , 2014, IEICE Trans. Inf. Syst..

[24]  W. Daniel Hillis,et al.  Data parallel algorithms , 1986, CACM.

[25]  Yasuaki Ito,et al.  The Approximate String Matching on the Hierarchical Memory Machine, with Performance Evaluation , 2013, 2013 IEEE 7th International Symposium on Embedded Multicore Socs.

[26]  Heng Li Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM , 2013, 1303.3997.

[27]  Alfred V. Aho,et al.  Efficient string matching , 1975, Commun. ACM.

[28]  Kefu Xu,et al.  Bit-Parallel Multiple Approximate String Matching based on GPU , 2013, ITQM.

[29]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[30]  Yao Zhang,et al.  Scan primitives for GPU computing , 2007, GH '07.

[31]  Avi Arampatzis,et al.  A study of query length , 2008, SIGIR '08.

[32]  Stephen W. Poole,et al.  Acceleration of the Smith-Waterman algorithm using single and multiple graphics processors , 2010, J. Comput. Phys..

[33]  Watanabe Osamu,et al.  Implementation of a Bit-parallel Approximate String Matching Algorithm (アルゴリズム(AL) Vol.2009-AL-124) , 2009 .

[34]  Gonzalo Navarro,et al.  A guided tour to approximate string matching , 2001, CSUR.

[35]  Lars Langner,et al.  Parallelization of Myers Fast Bit-Vector Algorithm using GPGPU , 2012 .

[36]  J. Kitzman,et al.  Personalized Copy-Number and Segmental Duplication Maps using Next-Generation Sequencing , 2009, Nature Genetics.

[37]  Yongchao Liu,et al.  Bit-Parallel Approximate Pattern Matching on the Xeon Phi Coprocessor , 2014, 2014 IEEE 26th International Symposium on Computer Architecture and High Performance Computing.