Segment Match Refinement and Applications

Comparison of large, unfinished genomic sequences requires fast methods that are robust to misordering, misorientation, and duplications. A number of fast methods exist that can compute local similarities between such sequences, from which an optimal one-to-one correspondence might be desired. However, existing methods for computing such a correspondence are either too costly to run or are inappropriate for unfinished sequence. We propose an efficient method for refining a set of segment matches such that the resulting segments are of maximal size without non-identity overlaps. This resolved set of segments can be used in various ways to compute a similarity measure between any two large sequences, and hence can be used in alignment, matching, or tree construction algorithms for two or more sequences.

[1]  Stefan Kurtz,et al.  REPuter: fast computation of maximal repeats in complete genomes , 1999, Bioinform..

[2]  Knut Reinert,et al.  A polyhedral approach to sequence alignment problems , 2000, Discret. Appl. Math..

[3]  George S. Lueker,et al.  A data structure for orthogonal range queries , 1978, 19th Annual Symposium on Foundations of Computer Science (sfcs 1978).

[4]  Dan E. Willard,et al.  New Data Structures for Orthogonal Queries. , 1979 .

[5]  R. Gibbs,et al.  PipMaker--a web server for aligning two genomic DNA sequences. , 2000, Genome research.

[6]  S. Salzberg,et al.  Alignment of whole genomes. , 1999, Nucleic acids research.

[7]  Dan E. Willard,et al.  New Data Structures for Orthogonal Range Queries , 1985, SIAM J. Comput..

[8]  B. Berger,et al.  Human and Mouse Gene Structure: Comparative Analysis and Application to Exon Prediction , 2000 .

[9]  Kiem-Phong Vo,et al.  Heaviest Increasing/Common Subsequence Problems , 1992, CPM.

[10]  Pavel A. Pevzner,et al.  Generalized Sequence Alignment and Duality , 1993 .

[11]  Eugene W Myers,et al.  On the sequencing and assembly of the human genome , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[12]  S F Altschul,et al.  Locally optimal subalignments using nonlinear similarity functions. , 1986, Bulletin of mathematical biology.

[13]  Jean-Paul Delahaye,et al.  Transformation distances: a family of dissimilarity measures based on movements of segments , 1999, Bioinform..

[14]  John D. Kececioglu,et al.  The Maximum Weight Trace Problem in Multiple Sequence Alignment , 1993, CPM.

[15]  Klaus Hahn,et al.  Segment-Based Scores for Pairwise and Multiple Sequence Alignments , 1998, ISMB.

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  D. Lipman,et al.  THE CONTEXT DEPENDENT COMPARISON OF BIOLOGICAL SEQUENCES , 1984 .