Barnacle: An Assembly Algorithm for Clone-based Sequences of Whole Genomes

We propose an assembly algorithm Barnacle for sequences generated by the clone-based approach. We illustrate our approach by assembling the human genome. Our novel method abandons the original physical-mapping-first framework. As we show, Barnacle more effectively resolves conflicts due to repeated sequences which is the main difficulty of the sequence assembly problem. In addition, we are able to detect inconsistencies in the underlying data. We present and compare our results on the December 2001 freeze of the public working draft of the human genome with NCBI's assembly (Build 28). The assembly of December 2001 freeze of the public working draft generated by Barnacle and the source code of Barnacle are available at (http://www.cs.rutgers.edu/~vchoi).

[1]  F. Collins,et al.  New goals for the U.S. Human Genome Project: 1998-2003. , 1998, Science.

[2]  V A McKusick,et al.  Mapping and sequencing the human genome. , 1989, The New England journal of medicine.

[3]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[4]  Stephan Olariu,et al.  The Ultimate Interval Graph Recognition Algorithm? (Extended Abstract). , 1998, ACM-SIAM Symposium on Discrete Algorithms.

[5]  K. Isono,et al.  The physical map of the whole E. coli chromosome: Application of a new strategy for rapid analysis and sorting of a large genomic library , 1987, Cell.

[6]  E. Eichler,et al.  Segmental duplications: what's missing, misassigned, and misassembled--and should we care? , 2001, Genome research.

[7]  International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome , 2001, Nature.

[8]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[9]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[10]  Wen-Lian Hsu,et al.  Fast and Simple Algorithms for Recognizing Chordal Comparability Graphs and Interval Graphs , 1999, SIAM J. Comput..

[11]  D. Haussler,et al.  A physical map of the human genome , 2001, Nature.

[12]  B. Berger,et al.  ARACHNE: a whole-genome shotgun assembler. , 2002, Genome research.

[13]  Haim Kaplan,et al.  Four Strikes Against Physical Mapping of DNA , 1995, J. Comput. Biol..

[14]  Eugene W. Myers,et al.  Whole-genome DNA sequencing , 1999, Comput. Sci. Eng..

[15]  Rolf H. Möhring,et al.  An Incremental Linear-Time Algorithm for Recognizing Interval Graphs , 1989, SIAM J. Comput..

[16]  M. Adams,et al.  Recent Segmental Duplications in the Human Genome , 2002, Science.

[17]  C. Lekkeikerker,et al.  Representation of a finite graph by a set of intervals on the real line , 1962 .

[18]  Kellogg S. Booth,et al.  Testing for the Consecutive Ones Property, Interval Graphs, and Graph Planarity Using PQ-Tree Algorithms , 1976, J. Comput. Syst. Sci..

[19]  Laurent Viennot,et al.  Lex-BFS and partition refinement, with applications to transitive orientation, interval graph recognition and consecutive ones testing , 2000, Theor. Comput. Sci..

[20]  Eugene W. Myers,et al.  An O(NP) Sequence Comparison Algorithm , 1990, Inf. Process. Lett..

[21]  B. Trask,et al.  Segmental duplications: organization and impact within the current human genome project assembly. , 2001, Genome research.

[22]  F. Sanger,et al.  Nucleotide sequence of bacteriophage φX174 DNA , 1977, Nature.

[23]  R. Staden,et al.  Nucleotide sequence of bacteriophage G4 DNA , 1978, Nature.

[24]  Mordecai J. Golin,et al.  Lopsided Trees: Analyses, Algorithms, and Applications , 1996, ICALP.

[25]  S. Antonarakis The mapping and sequencing of the human genome. , 1990, Southern medical journal.

[26]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[27]  L. Hood,et al.  A common language for physical mapping of the human genome. , 1989, Science.

[28]  Feodor F. Dragan,et al.  LexBFS-orderings and powers of chordal graphs , 1997, Discret. Math..

[29]  Robert E. Tarjan,et al.  Algorithmic Aspects of Vertex Elimination on Graphs , 1976, SIAM J. Comput..

[30]  D. Haussler,et al.  Assembly of the working draft of the human genome with GigAssembler. , 2001, Genome research.