A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given.

Among the fundamental problems in molecular evolution and in the analysis of homologous sequences are alignment, phylogeny reconstruction, and the reconstruction of ancestral sequences. This paper presents a fast, combined solution to these problems. The new algorithm gives an approximation to the minimal history in terms of a distance function on sequences. The distance function on sequences is a minimal weighted path length constructed from substitutions and insertions-deletions of segments of any length. Substitutions are weighted with an arbitrary metric on the set of nucleotides or amino acids, and indels are weighted with a gap penalty function of the form gk = a + (bxk), where k is the length of the indel and a and b are two positive numbers. A novel feature is the introduction of the concept of sequence graphs and a generalization of the traditional dynamic sequence comparison algorithm to the comparison of sequence graphs. Sequence graphs ease several computational problems. They are used to represent large sets of sequences that can then be compared simultaneously. Furthermore, they allow the handling of multiple, equally good, alignments, where previous methods were forced to make arbitrary choices. A program written in C implemented this method; it was tested first on 22 5S RNA sequences.

[1]  R. Graham,et al.  The steiner problem in phylogeny is NP-complete , 1982 .

[2]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[3]  S. B. Needleman,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 1989 .

[4]  M S Waterman,et al.  Sequence alignments in the neighborhood of the optimum with general application to dynamic programming. , 1983, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Peter H. Sellers,et al.  An Algorithm for the Distance Between Two Finite Sequences , 1974, J. Comb. Theory, Ser. A.

[6]  Michael S. Waterman,et al.  A dynamic programming algorithm to find all solutions in a neighborhood of the optimum , 1985 .

[7]  Patrick L. Williams,et al.  Finding the Minimal Change in a Given Tree , 1990 .

[8]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[9]  D. Sankoff,et al.  Evolution of 5S RNA and the non-randomness of base replacement. , 1973, Nature: New biology.

[10]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[11]  Joseph Felsenstein,et al.  A likelihood approach to character weighting and what it tells us about parsimony and compatibility , 1981 .

[12]  Esko Ukkonen,et al.  Algorithms for Approximate String Matching , 1985, Inf. Control..

[13]  J. Farris A Successive Approximations Approach to Character Weighting , 1969 .

[14]  H. M. Martinez,et al.  A multiple sequence alignment program , 1986, Nucleic Acids Res..

[15]  J. Hartigan MINIMUM MUTATION FITS TO A GIVEN TREE , 1973 .

[16]  B Hesper,et al.  Evolution of the primary and secondary structures of the E1a mRNAs of the adenovirus. , 1987, Molecular biology and evolution.

[17]  M S Waterman,et al.  Multiple sequence alignment by consensus. , 1986, Nucleic acids research.

[18]  W. A. Beyer,et al.  Some Biological Sequence Metrics , 1976 .

[19]  S. Osawa,et al.  Origin and evolution of organisms as deduced from 5S ribosomal RNA sequences. , 1987, Molecular biology and evolution.

[20]  D. Sankoff Minimal Mutation Trees of Sequences , 1975 .

[21]  James W. Fickett,et al.  Fast optimal alignment , 1984, Nucleic Acids Res..

[22]  D. Bacon,et al.  Multiple Sequence Alignment , 1986, Journal of molecular biology.

[23]  J. Hein,et al.  A tree reconstruction method that is economical in the number of pairwise comparisons used. , 1989, Molecular biology and evolution.

[24]  M. Fredman,et al.  Algorithms for computing evolutionary similarity measures with length independent gap penalties , 1984 .

[25]  W. Bains,et al.  MULTAN: a program to align multiple DNA sequences , 1986, Nucleic Acids Res..

[26]  D Sankoff,et al.  Matching sequences under deletion-insertion constraints. , 1972, Proceedings of the National Academy of Sciences of the United States of America.