Large-Scale DNA Sequence Assembly by Using Computing Grid

DNA sequence assembly is a fundamental part of biological computing. However, most of the large-scale sequence assemblies require intensive computing power and huge storage. To speed up the assembly process, we here propose a method for large-scale DNA sequence assembly by using computing grid. The central idea of our method is to first cluster the input of fragment set into many non-intersected subsets using k-mers and then to distribute them to all nodes of the grid-computing system. Our method has accuracy of more than 92% on the test data sets under the simulated grid-computing system but costing shorter time and lower storage. Our method can efficiently process large-scale DNA sequence assembly by taking advantage of huge storage and computing capacity of computing gird

[1]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[2]  Eugene W. Myers,et al.  Design of a compartmentalized shotgun assembler for the human genome , 2001, ISMB.

[3]  J. Bonfield,et al.  A new DNA sequence assembly program. , 1995, Nucleic acids research.

[4]  B. Berger,et al.  ARACHNE: a whole-genome shotgun assembler. , 2002, Genome research.

[5]  G. Weinstock,et al.  The Atlas genome assembly system. , 2004, Genome research.

[6]  J. Mullikin,et al.  The phusion assembler. , 2003, Genome research.

[7]  Stephan C. Schuster,et al.  Response to Comment on "Whole-Genome Shotgun Sequencing of Mitochondria from Ancient Hair Shafts" , 2008, Science.

[8]  Shinichi Morishita,et al.  Whole Genome Shotgun Sequencing , 2006 .

[9]  Geoffrey C. Fox,et al.  Parallel Data Mining from Multicore to Cloudy Grids , 2008, High Performance Computing Workshop.

[10]  Eugene W. Myers,et al.  The greedy path-merging algorithm for sequence assembly , 2001, RECOMB.

[11]  P. Pevzner,et al.  An Eulerian path approach to DNA fragment assembly , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[12]  Huanming Yang,et al.  RePS: a sequence assembler that masks exact repeats identified from the shotgun data. , 2002, Genome research.

[13]  Joachim M. Buhmann,et al.  Pairwise Data Clustering by Deterministic Annealing , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[14]  Owen White,et al.  TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects , 1995 .

[15]  Geoffrey C. Fox,et al.  MapReduce for Data Intensive Scientific Analyses , 2008, 2008 IEEE Fourth International Conference on eScience.

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  P. Pevzner 1-Tuple DNA sequencing: computer analysis. , 1989, Journal of biomolecular structure & dynamics.

[18]  J. Weber,et al.  Human whole-genome shotgun sequencing. , 1997, Genome research.

[19]  Haixu Tang,et al.  A new approach to fragment assembly in DNA sequencing , 2001, RECOMB.

[20]  L. Hillier,et al.  PCAP: a whole-genome assembly program. , 2003, Genome research.

[21]  Haixu Tang,et al.  Fragment assembly with double-barreled data , 2001, ISMB.

[22]  Michael Isard,et al.  DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language , 2008, OSDI.

[23]  S. Kim,et al.  AMASS: A Structured Pattern Matching Approach to Shotgun Sequence Assembly , 1998, J. Comput. Biol..

[24]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[25]  D. Haussler,et al.  Assembly of the working draft of the human genome with GigAssembler. , 2001, Genome research.

[26]  Steven Skiena,et al.  A case study in genome-level fragment assembly , 2000, Bioinform..