Parka: A Parallel Implementation of BLAST with MapReduce

Bioinformatics applications have become more data-intensive and compute-intensive, which requires an effective method to implement parallel computing and get a high-throughput. Although there exists some tools to realize parallelization of BLAST, but most of them depend on complex platforms or software. A parallel BLAST is implemented using Spark, which is called Parka. The parallel execution time and speedup of Parka are evaluated in a cluster environment. Then, it is compared with Hadoop-based parallelization method. Results show that it is a scalable and effective parallelization approach for sequence alignment.

[1]  Robert D. Bjornson,et al.  TurboBLAST : a parallel implementation of blast built on the turbohub , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[2]  Michael C. Schatz,et al.  CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[3]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[4]  Gilles Fedak,et al.  BitDew: A data management and distribution service with multi-protocol file transfer and metadata abstraction , 2009, J. Netw. Comput. Appl..

[5]  Ying Sun,et al.  ABCGrid: Application for Bioinformatics Computing Grid , 2007, Bioinform..

[6]  Wu-chun Feng,et al.  The design, implementation, and evaluation of mpiBLAST , 2003 .

[7]  Gilles Fedak,et al.  BLAST Application with Data-Aware Desktop Grid Middleware , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[8]  Italo Epicoco,et al.  A Bioinfomatics Grid Alignment Toolkit , 2008, Future Gener. Comput. Syst..

[9]  Nikolaos V. Sahinidis,et al.  GPU-BLAST: using graphics processors to accelerate protein sequence alignment , 2010, Bioinform..

[10]  Chao-Tung Yang,et al.  G-BLAST: a Grid-based solution for mpiBLAST on computational Grids , 2009 .

[11]  José A. B. Fortes,et al.  CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications , 2008, 2008 IEEE Fourth International Conference on eScience.

[12]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[13]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.