Comparative genome assembly

One of the most complex and computationally intensive tasks of genome sequence analysis is genome assembly. Even today, few centres have the resources, in both software and hardware, to assemble a genome from the thousands or millions of individual sequences generated in a whole-genome shotgun sequencing project. With the rapid growth in the number of sequenced genomes has come an increase in the number of organisms for which two or more closely related species have been sequenced. This has created the possibility of building a comparative genome assembly algorithm, which can assemble a newly sequenced genome by mapping it onto a reference genome. We describe here a novel algorithm for comparative genome assembly that can accurately assemble a typical bacterial genome in less than four minutes on a standard desktop computer. The software is available as part of the open-source AMOS project.

[1]  Meng-Yao Liu,et al.  Genome sequence of a serotype M3 strain of group A Streptococcus: Phage-encoded toxins, the high-virulence phenotype, and clone emergence , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Mihai Pop,et al.  Comparative Genome Sequencing for Discovery of Novel Polymorphisms in Bacillus anthracis , 2002, Science.

[3]  Eugene W. Myers,et al.  ReAligner: a program for refining DNA sequence multi-alignments , 1997, RECOMB '97.

[4]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[5]  Ling V. Sun,et al.  Phylogenomics of the Reproductive Parasite Wolbachia pipientis wMel: A Streamlined Genome Overrun by Mobile Genetic Elements , 2004, PLoS biology.

[6]  Bruce A. Roe,et al.  Complete genome sequence of an M1 strain of Streptococcus pyogenes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[7]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.

[8]  A. Danchin,et al.  Genome‐based analysis of virulence genes in a non‐biofilm‐forming Staphylococcus epidermidis strain (ATCC 12228) , 2003, Molecular microbiology.

[9]  Ian T. Paulsen,et al.  Complete genome sequence and comparative genomic analysis of an emerging human pathogen, serotype V Streptococcus agalactiae , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Hans Söderlund,et al.  SEQAID: a DNA sequence assembling program based on a mathematical model , 1984, Nucleic Acids Res..

[11]  O. White,et al.  Environmental Genome Shotgun Sequencing of the Sargasso Sea , 2004, Science.

[12]  Mihai Pop,et al.  Shotgun Sequence Assembly , 2004, Adv. Comput..

[13]  E. Lander,et al.  Genomic mapping by fingerprinting random clones: a mathematical analysis. , 1988, Genomics.

[14]  Owen White,et al.  TIGR Assembler: A New Tool for Assembling Large Shotgun Sequencing Projects , 1995 .

[15]  J. Banfield,et al.  Community structure and metabolism through reconstruction of microbial genomes from the environment , 2004, Nature.

[16]  B. Berger,et al.  ARACHNE: a whole-genome shotgun assembler. , 2002, Genome research.

[17]  E. Kirkness,et al.  The Dog Genome: Survey Sequencing and Comparative Analysis , 2003, Science.

[18]  Todd M. Smith,et al.  Genome sequence and comparative microarray analysis of serotype M18 group A Streptococcus strains associated with acute rheumatic fever outbreaks , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Eugene W. Myers,et al.  A whole-genome assembly of Drosophila. , 2000, Science.

[20]  Carmen Buchrieser,et al.  Genome sequence of Streptococcus agalactiae, a pathogen causing invasive neonatal disease , 2002, Molecular microbiology.

[21]  Hui-Hsien Chou,et al.  DNA sequence quality trimming and vector removal , 2001, Bioinform..

[22]  S. Salzberg,et al.  Hierarchical scaffolding with Bambus. , 2003, Genome research.

[23]  S. Salzberg,et al.  Versatile and open software for comparing large genomes , 2004, Genome Biology.

[24]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.