Parametric Recomuting in Alignment Graphs

DNA/protein sequence alignments in computational molecular biology depend heavily on the settings of penalties for substitutions, insertions/deletions and gaps. Inappropriate choice of parameters causes irrelevant matches (“noise”) to be reported, thus obscuring biologically relevant matches. In practice, biologists frequently compare sequences in a few iterations, starting from a vague idea about appropriate parameters, then refining parameters to reduce noise. This procedure often helps to delineate biologically interesting similarities and to substantially reduce laborious analysis. This paper provides a computational underpinning for such iterative noise filtration in alignment graphs. Our main results assume that a preliminary “noisy” alignment, computed with reasonable but ad hoc parameters, is given; the problem is to modify the parameters to reduce noise. We present fast algorithms to refine penalty parameters and describe an application of these algorithms.

[1]  E. Lander,et al.  Parametric sequence comparisons. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[2]  M S Boguski,et al.  Analysis of conserved domains and sequence motifs in cellular regulatory proteins and locus control regions using new software tools for multiple alignment and visualization. , 1992, The New biologist.

[3]  Dan Gusfield,et al.  Parametric optimization of sequence alignment , 1992, SODA '92.

[4]  W. Miller,et al.  A point of contact between computer science and molecular biology , 1994, IEEE Computational Science and Engineering.

[5]  T. Smith,et al.  Optimal sequence alignments. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[6]  M S Waterman,et al.  Sequence alignment and penalty choice. Review of concepts, case studies and implications. , 1994, Journal of molecular biology.

[7]  E. Myers,et al.  Sequence comparison with concave weighting functions. , 1988, Bulletin of mathematical biology.

[8]  M. O. Dayhoff,et al.  Establishing homologies in protein sequences. , 1983, Methods in enzymology.

[9]  W. Miller,et al.  A time-efficient, linear-space local similarity algorithm , 1991 .

[10]  R C Hardison,et al.  Software tools for analyzing pairwise alignments of long sequences. , 1991, Nucleic acids research.

[11]  X. Huang,et al.  An algorithm for identifying regions of a DNA sequence that satisfy a content requirement , 1994, Comput. Appl. Biosci..

[12]  O. Gotoh,et al.  Optimal sequence alignment allowing for long gaps. , 1990, Bulletin of mathematical biology.

[13]  M. Waterman,et al.  A new algorithm for best subsequence alignments with application to tRNA-rRNA comparisons. , 1987, Journal of molecular biology.

[14]  W. Miller,et al.  Use of long sequence alignments to study the evolution and regulation of mammalian globin gene clusters. , 1993, Molecular biology and evolution.

[15]  Kun-Mao Chao,et al.  Positive and negative regulatory elements of the rabbit embryonic eglobin gene revealed by an improved multiple alignment program and functional analysis , 1993 .

[16]  Michael Ian Shamos,et al.  Computational geometry: an introduction , 1985 .

[17]  V. V. Panjukov Finding steady alignments: similarity and distance , 1993, Comput. Appl. Biosci..

[18]  Martin Vingron,et al.  A new interactive protein sequence alignment program and comparison of its results with widely used algorithms , 1989, Comput. Appl. Biosci..