论文信息 - Parka: A Parallel Implementation of BLAST with MapReduce

Parka: A Parallel Implementation of BLAST with MapReduce

Bioinformatics applications have become more data-intensive and compute-intensive, which requires an effective method to implement parallel computing and get a high-throughput. Although there exists some tools to realize parallelization of BLAST, but most of them depend on complex platforms or software. A parallel BLAST is implemented using Spark, which is called Parka. The parallel execution time and speedup of Parka are evaluated in a cluster environment. Then, it is compared with Hadoop-based parallelization method. Results show that it is a scalable and effective parallelization approach for sequence alignment.

Li Zhang | Bing Tang

[1] Robert D. Bjornson,et al. TurboBLAST : a parallel implementation of blast built on the turbohub , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[2] Michael C. Schatz,et al. CloudBurst: highly sensitive read mapping with MapReduce , 2009, Bioinform..

[3] Michael J. Franklin,et al. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[4] Gilles Fedak,et al. BitDew: A data management and distribution service with multi-protocol file transfer and metadata abstraction , 2009, J. Netw. Comput. Appl..

[5] Ying Sun,et al. ABCGrid: Application for Bioinformatics Computing Grid , 2007, Bioinform..

[6] Wu-chun Feng,et al. The design, implementation, and evaluation of mpiBLAST , 2003 .

[7] Gilles Fedak,et al. BLAST Application with Data-Aware Desktop Grid Middleware , 2009, 2009 9th IEEE/ACM International Symposium on Cluster Computing and the Grid.

[8] Italo Epicoco,et al. A Bioinfomatics Grid Alignment Toolkit , 2008, Future Gener. Comput. Syst..

[9] Nikolaos V. Sahinidis,et al. GPU-BLAST: using graphics processors to accelerate protein sequence alignment , 2010, Bioinform..

[10] Chao-Tung Yang,et al. G-BLAST: a Grid-based solution for mpiBLAST on computational Grids , 2009 .

[11] José A. B. Fortes,et al. CloudBLAST: Combining MapReduce and Virtualization on Distributed Resources for Bioinformatics Applications , 2008, 2008 IEEE Fourth International Conference on eScience.

[12] Scott Shenker,et al. Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[13] E. Myers,et al. Basic local alignment search tool. , 1990, Journal of molecular biology.