H‐BLAST: a fast protein sequence alignment toolkit on heterogeneous computers with GPUs

Motivation: The sequence alignment is a fundamental problem in bioinformatics. BLAST is a routinely used tool for this purpose with over 118 000 citations in the past two decades. As the size of bio‐sequence databases grows exponentially, the computational speed of alignment softwares must be improved. Results: We develop the heterogeneous BLAST (H‐BLAST), a fast parallel search tool for a heterogeneous computer that couples CPUs and GPUs, to accelerate BLASTX and BLASTP—basic tools of NCBI‐BLAST. H‐BLAST employs a locally decoupled seed‐extension algorithm for better performance on GPUs, and offers a performance tuning mechanism for better efficiency among various CPUs and GPUs combinations. H‐BLAST produces identical alignment results as NCBI‐BLAST and its computational speed is much faster than that of NCBI‐BLAST. Speedups achieved by H‐BLAST over sequential NCBI‐BLASTP (resp. NCBI‐BLASTX) range mostly from 4 to 10 (resp. 5 to 7.2). With 2 CPU threads and 2 GPUs, H‐BLAST can be faster than 16‐threaded NCBI‐BLASTX. Furthermore, H‐BLAST is 1.5‐4 times faster than GPU‐BLAST. Availability and Implementation: https://github.com/Yeyke/H‐BLAST.git Contact: yux06@syr.edu Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Xiaowen Chu,et al.  G-BLASTN: accelerating nucleotide alignment by graphics processors , 2014, Bioinform..

[2]  Ruihua Wang,et al.  Decelerated genome evolution in modern vertebrates revealed by analysis of multiple lancelet genomes , 2014, Nature Communications.

[3]  Andreas Wilke,et al.  Using clouds for metagenomics: A case study , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[4]  Weiguo Liu,et al.  CUDA-BLASTP: Accelerating BLASTP on CUDA-Enabled Graphics Hardware , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[5]  Yongan Zhao,et al.  RAPSearch2: a fast and memory-efficient protein similarity search tool for next-generation sequencing data , 2011, Bioinform..

[6]  John M. Carroll,et al.  HBLAST: Parallelised sequence similarity - A Hadoop MapReducable basic local alignment search tool , 2015, J. Biomed. Informatics.

[7]  Yongchao Liu,et al.  Mapping of BLASTP Algorithm onto GPU Clusters , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[8]  Rebecca F. Halperin,et al.  GuiTope: an application for mapping random-sequence peptides to protein sequences , 2012, BMC Bioinformatics.

[9]  Henry Wong,et al.  Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.

[10]  Robert Bossy,et al.  BioNLP Shared Task - The Bacteria Track , 2012, BMC Bioinformatics.

[11]  Ying Chen,et al.  High speed BLASTN: an accelerated MegaBLAST search tool , 2015, Nucleic acids research.

[12]  Chao Xie,et al.  Fast and sensitive protein alignment using DIAMOND , 2014, Nature Methods.

[13]  Jack A Gilbert,et al.  Community ecology as a framework for human microbiome research , 2019, Nature Medicine.

[14]  Khaled Benkrid,et al.  Design and implementation of a CUDA-compatible GPU-based core for gapped BLAST algorithm , 2010, ICCS.

[15]  Hiroshi Mori,et al.  CLAST: CUDA implemented large-scale alignment search tool , 2014, BMC Bioinformatics.

[16]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[17]  M. David,et al.  Metagenomic analysis of a permafrost microbial community reveals a rapid response to thaw , 2011, Nature.

[18]  Lei Zhou,et al.  BLAST++: BLASTing queries in batches , 2003, Bioinform..

[19]  Michael P. Cummings,et al.  A comparative evaluation of sequence classification programs , 2012, BMC Bioinformatics.

[20]  Knut Reinert,et al.  Lambda: the local aligner for massive biological data , 2014, Bioinform..

[21]  Takashi Ishida,et al.  GHOSTM: A GPU-Accelerated Homology Search Tool for Metagenomics , 2012, PloS one.

[22]  Christopher S. Oehmen,et al.  ScalaBLAST 2.0: rapid and robust BLAST calculations on multiprocessor systems , 2013, Bioinform..

[23]  Wu-chun Feng,et al.  Accelerating Protein Sequence Search in a Heterogeneous Computing System , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[24]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[25]  Yongchao Liu,et al.  CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions , 2013, BMC Bioinformatics.

[26]  Lars Wienbrandt,et al.  Massively parallel FPGA-based implementation of BLASTp with the two-hit method , 2011, ICCS.

[27]  Lenore Cowen,et al.  Compressive genomics for protein databases , 2013, Bioinform..

[28]  Jarek Nieplocha,et al.  ScalaBLAST: A Scalable Implementation of BLAST for High-Performance Data-Intensive Bioinformatics Analysis , 2006, IEEE Transactions on Parallel and Distributed Systems.

[29]  Martin C. Herbordt,et al.  Single Pass, BLAST-Like, Approximate String Matching on FPGAs , 2006, 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[30]  Hao Wang,et al.  cuBLASTP: Fine-Grained Parallelization of Protein Sequence Search on a GPU , 2017, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[31]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[32]  Alejandro A. Schäffer,et al.  Database indexing for production MegaBLAST searches , 2008, Bioinform..

[33]  Robert C. Edgar,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2001 .

[34]  Nikolaos V. Sahinidis,et al.  GPU-BLAST: using graphics processors to accelerate protein sequence alignment , 2010, Bioinform..

[35]  Bertil Schmidt,et al.  Accelerating BLASTP on the Cell Broadband Engine , 2008, PRIB.

[36]  Wu-chun Feng,et al.  The design, implementation, and evaluation of mpiBLAST , 2003 .

[37]  Alejandro A. Schäffer,et al.  Database indexing for production MegaBLAST searches , 2008, Bioinform..

[38]  Yong Dou,et al.  FPGA-Based Accelerators for BLAST Families with Multi-Seeds Detection and Parallel Extension , 2008, 2008 2nd International Conference on Bioinformatics and Biomedical Engineering.