Chaining Algorithms for Alignment of Draft Sequence

In this paper we propose a chaining method that can align a draft genomic sequence against a finished genome. We introduce the use of an overlap tree to enhance the state information available to the chaining procedure in the context of sparse dynamic programming, and demonstrate that the resulting procedure more accurately penalizes the various biological rearrangements. The algorithm is tested on a whole genome alignment of seven yeast species. We also demonstrate a variation on the algorithm that can be used for co-assembly of two genomes and show how it can improve the current assembly of the Ciona savignyi (sea squirt) genome.

[1]  B. Birren,et al.  Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae , 2004, Nature.

[2]  Michael Brudno,et al.  Fast and sensitive alignment of large genomic sequences , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[3]  R. Durbin,et al.  RNA sequence analysis using covariance models. , 1994, Nucleic acids research.

[4]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[5]  David Eppstein,et al.  Sparse dynamic programming , 1990, SODA '90.

[6]  Piotr Berman,et al.  Aligning two fragmented sequences , 2003, Discret. Appl. Math..

[7]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[8]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[9]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[10]  David Eppstein,et al.  Sparse dynamic programming I: linear cost functions , 1992, JACM.

[11]  I. Dunham Faculty Opinions recommendation of Sequencing and comparison of yeast species to identify genes and regulatory elements. , 2003 .

[12]  Bernhard Thalheim,et al.  Current Issues in Databases and Information Systems , 2001, Lecture Notes in Computer Science.

[13]  Enno Ohlebusch,et al.  A Local Chaining Algorithm and Its Applications in Comparative Genomics , 2003, WABI.

[14]  B. Berger,et al.  ARACHNE: a whole-genome shotgun assembler. , 2002, Genome research.

[15]  F. Warren Burton,et al.  Multiple Generation Text Files Using Overlapping Tree Structures , 1985, Comput. J..

[16]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[17]  Piotr Berman,et al.  Aligning two fragmented sequences , 2002, Proceedings 16th International Parallel and Distributed Processing Symposium.

[18]  Sorin Istrail,et al.  Finding anchors for genomic sequence comparison , 2004, RECOMB.

[19]  L. Fulton,et al.  Finding Functional Features in Saccharomyces Genomes by Phylogenetic Footprinting , 2003, Science.

[20]  Michael Brudno,et al.  Fast and sensitive multiple alignment of large genomic sequences , 2003, BMC Bioinformatics.

[21]  Nicholas L. Bray,et al.  AVID: A global alignment program. , 2003, Genome research.

[22]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.

[23]  Yannis Manolopoulos,et al.  Multiversion Linear Quadtree for Spatio-Temporal Data , 2000, ADBIS-DASFAA.

[24]  E. Mauceli,et al.  Whole-genome sequence assembly for mammalian genomes: Arachne 2. , 2003, Genome research.

[25]  J. Mullikin,et al.  The phusion assembler. , 2003, Genome research.

[26]  Rita Casadio,et al.  Algorithms in Bioinformatics, 5th International Workshop, WABI 2005, Mallorca, Spain, October 3-6, 2005, Proceedings , 2005, WABI.

[27]  Inna Dubchak,et al.  Glocal alignment: finding rearrangements during alignment , 2003, ISMB.

[28]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.