Blast-Parallel: The parallelizing implementation of sequence alignment algorithms based on Hadoop platform

The sequence alignment is a basic method for processing the information in Bioinformatics, it has a great significance for finding the function and the structure of nucleic acids and protein sequences and the information of evolution. This paper briefly describes the relevant issues of sequence alignment and the most common local sequence alignment algorithms, Blast algorithm. At present, the Blast algorithm which provided by NCBI or stand-alone can not meet the actual demand for the flood of biological data, this paper achieves the Blast-Parallel algorithm by further improvement based on the Hadoop-Blast algorithm. Through serial experiments of the stand-alone Blast algorithm and parallelizing experiments of the Hadoop-Blast algorithm and the Blast-Parallel algorithm based on Hadoop platform, results show that the Blast algorithm has significantly higher execution efficiency after the parallelization, and the matching speed of the Blast-Parallel algorithm which has been improved can achieve 1~1.5 times of the Hadoop-Blast algorithm.

[1]  Lei Zhou,et al.  BLAST++: BLASTing queries in batches , 2003, Bioinform..

[2]  Bertil Schmidt,et al.  Accelerating BLASTP on the Cell Broadband Engine , 2008, PRIB.

[3]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[4]  Milind A. Bhandarkar,et al.  MapReduce programming with apache Hadoop , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[5]  Wu-chun Feng,et al.  The design, implementation, and evaluation of mpiBLAST , 2003 .

[6]  Jarek Nieplocha,et al.  ScalaBLAST: A Scalable Implementation of BLAST for High-Performance Data-Intensive Bioinformatics Analysis , 2006, IEEE Transactions on Parallel and Distributed Systems.

[7]  Joseph M. Lancaster,et al.  Mercury BLASTP: Accelerating Protein Sequence Alignment , 2008, TRETS.

[8]  Dominique Lavenier,et al.  PLAST: parallel local alignment search tool for database comparison , 2009, BMC Bioinformatics.

[9]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[10]  Weiguo Liu,et al.  CUDA-BLASTP: Accelerating BLASTP on CUDA-Enabled Graphics Hardware , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Hugh E. Williams,et al.  A Deterministic Finite Automaton for Faster Protein Hit Detection in BLAST , 2006, J. Comput. Biol..

[12]  Martin C. Herbordt,et al.  Single pass streaming BLAST on FPGAs , 2007, Parallel Comput..

[13]  M. A. Kentie Biological Sequence Alignment Using Graphics Processing Units , 2010 .

[14]  Keith D. Underwood,et al.  RC-BLAST: towards a portable, cost-effective open source hardware implementation , 2005, IEEE International Parallel and Distributed Processing Symposium.

[15]  Robert D. Bjornson,et al.  TurboBLAST : a parallel implementation of blast built on the turbohub , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[16]  José A. B. Fortes,et al.  CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications , 2008, 2008 IEEE Fourth International Conference on eScience.

[17]  Nikolaos V. Sahinidis,et al.  GPU-BLAST: using graphics processors to accelerate protein sequence alignment , 2010, Bioinform..

[18]  Nagiza F. Samatova,et al.  Efficient data access for parallel BLAST , 2005, 19th IEEE International Parallel and Distributed Processing Symposium.

[19]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.