Small World Asynchronous Parallel Model for Genome Assembly

Large de bruijn graph based algorithm is widely used in genome assembly and metagenetic assembly. The scale of this kind of graphs - in some cases billions of vertices and edges - poses challenges to genome assembly problem. In this paper, a one-step bi-directed graph is used to abstract the problem of genome assembly. After that small world asynchronous parallel model (SWAP) is proposed to handle the edge merging operation predefined in the graph. SWAP aims at making use of the locality of computing and communication to explore parallelism for graph algorithm. Based on the above graph abstraction and SWAP model, an assembler is developed, and experiment results shows that a factor of 20 times speedup is achieved when the number of processors scales from 10 to 640 when testing on processing C.elegans data.

[1]  Srinivas Aluru,et al.  Parallel de novo assembly of large genomes from high-throughput short reads , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[2]  Giuseppe Di Battista,et al.  26 Computer Networks , 2004 .

[3]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[4]  Albert Chan,et al.  CGMGRAPH/CGMLIB: Implementing and Testing CGM Graph Algorithms on PC Clusters and Shared Memory Machines , 2005, Int. J. High Perform. Comput. Appl..

[5]  Torsten Suel,et al.  Portable and Efficient Parallel Computing Using the BSP Model , 1999, IEEE Trans. Computers.

[6]  Olaf Bonorden,et al.  The Paderborn University BSP (PUB) library , 2003, Parallel Comput..

[7]  Srinivas Aluru,et al.  Parallel short sequence assembly of transcriptomes , 2009, BMC Bioinformatics.

[8]  冯圣中,et al.  Small World Asynchronous Parallel Model for Genome Assembly , 2012 .

[9]  Douglas P. Gregor,et al.  The Parallel BGL : A Generic Library for Distributed Graph Computations , 2005 .

[10]  Michael S. Waterman,et al.  A New Algorithm for DNA Sequence Assembly , 1995, J. Comput. Biol..

[11]  S. Bennett Solexa Ltd. , 2004, Pharmacogenomics.

[12]  Leslie G. Valiant,et al.  A bridging model for parallel computation , 1990, CACM.

[13]  Srinivas Aluru,et al.  Parallel Construction of Bidirected String Graphs for Genome Assembly , 2008, 2008 37th International Conference on Parallel Processing.

[14]  Bairong Shen,et al.  A Practical Comparison of De Novo Genome Assembly Software Tools for Next-Generation Sequencing Technologies , 2011, PloS one.

[15]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[16]  Andrew Lumsdaine,et al.  Lifting sequential graph algorithms for distributed-memory parallel computation , 2005, OOPSLA '05.

[17]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[18]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[19]  Siu-Ming Yiu,et al.  IDBA - A Practical Iterative de Bruijn Graph De Novo Assembler , 2010, RECOMB.

[20]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[21]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.