A Local Chaining Algorithm and Its Applications in Comparative Genomics

Given fragments from multiple genomes, we will show how to find an optimal local chain of colinear non-overlapping fragments in sub-quadratic time, using methods from computational geometry. A variant of the algorithm finds all significant local chains of colinear non-overlapping fragments. The local chaining algorithm can be used in a variety of problems in comparative genomics: The identification of regions of similarity (candidate regions of conserved synteny), the detection of genome rearrangements such as transpositions and inversions, and exon prediction.

[1]  Balaji Raghavachari,et al.  Chaining Multiple-Alignment Blocks , 1994, J. Comput. Biol..

[2]  R. Gibbs,et al.  PipMaker--a web server for aligning two genomic DNA sequences. , 2000, Genome research.

[3]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[4]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[5]  Enno Ohlebusch,et al.  Efficient multiple genome alignment , 2002, ISMB.

[6]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[7]  Bernard Chazelle,et al.  A Functional Approach to Data Structures and Its Use in Multidimensional Searching , 1988, SIAM J. Comput..

[8]  Enno Ohlebusch,et al.  Multiple Genome Alignment: Chaining Algorithms Revisited , 2003, CPM.

[9]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[10]  D. Lipman,et al.  Improved tools for biological sequence comparison. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[11]  David Eppstein,et al.  Sparse dynamic programming , 1990, SODA '90.

[12]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[13]  S. Salzberg,et al.  DNA sequence of both chromosomes of the cholera pathogen Vibrio cholerae , 2000, Nature.

[14]  Enno Ohlebusch,et al.  An Applications-focused Review of Comparative Genomics Tools: Capabilities, Limitations and Future Challenges , 2003, Briefings Bioinform..

[15]  Chak-Kuen Wong,et al.  Worst-case analysis for region and partial region searches in multidimensional binary search trees and balanced quad trees , 1977, Acta Informatica.

[16]  S. Salzberg,et al.  Evidence for symmetric chromosomal inversions around the replication origin in bacteria , 2000, Genome Biology.

[17]  Eugene W. Myers,et al.  Chaining multiple-alignment fragments in sub-quadratic time , 1995, SODA '95.

[18]  W. J. Kent,et al.  Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment. , 2000, Genome research.

[19]  H. Hilbert,et al.  Comparative analysis of the genomes of the bacteria Mycoplasma pneumoniae and Mycoplasma genitalium. , 1997, Nucleic acids research.

[20]  Aleksey Y. Ogurtsov,et al.  A hierarchical approach to aligning collinear regions of genomes , 2002, Bioinform..

[21]  Donald B. Johnson,et al.  A priority queue in which initialization and queue operations takeO(loglogD) time , 1981, Mathematical systems theory.

[22]  Nicholas L. Bray,et al.  AVID: A global alignment program. , 2003, Genome research.

[23]  S. Salzberg,et al.  Fast algorithms for large-scale genome alignment and comparison. , 2002, Nucleic acids research.

[24]  Diarmaid Hughes,et al.  Evaluating genome dynamics: the constraints on rearrangements within bacterial genomes , 2000, Genome Biology.

[25]  Jon Louis Bentley,et al.  K-d trees for semidynamic point sets , 1990, SCG '90.

[26]  Balaji Raghavachari,et al.  Constructing Aligned Sequence Blocks , 1994, J. Comput. Biol..

[27]  Peter van Emde Boas,et al.  Preserving Order in a Forest in Less Than Logarithmic Time and Linear Space , 1977, Inf. Process. Lett..

[28]  B. Berger,et al.  Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction , 2000 .

[29]  Enno Ohlebusch,et al.  The Enhanced Suffix Array and Its Applications to Genome Analysis , 2002, WABI.

[30]  Burkhard Morgenstern,et al.  A space-efficient algorithm for aligning large genomic sequences , 2000, Bioinform..

[31]  Michael Brudno,et al.  Fast and sensitive alignment of large genomic sequences , 2002, Proceedings. IEEE Computer Society Bioinformatics Conference.

[32]  David Eppstein,et al.  Sparse dynamic programming II: convex and concave cost functions , 1992, JACM.

[33]  Serge A. Hazout,et al.  A strategy for finding regions of similarity in complete genome sequences , 1998, Bioinform..

[34]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.