An improved algorithm for statistical alignment of sequences related by a star tree

The insertion-deletion model developed by Thorne, Kishino and Felsenstein (1991, J. Mol. Evol., 33, 114–124; the TKF91 model) provides a statistical framework of two sequences. The statistical alignment of a set of sequences related by a star tree is a generalization of this model. The known algorithm computes the probability of a set of such sequences in O(l2k) time, where l is the geometric mean of the sequence lengths and k is the number of sequences. An improved algorithm is presented whose running time is only O(22klk).

[1]  Durbin,et al.  Biological Sequence Analysis , 1998 .

[2]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[3]  J. Hein,et al.  Statistical alignment: computational properties, homology testing and goodness-of-fit. , 2000, Journal of molecular biology.

[4]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[5]  C. S. Wallace,et al.  The posterior probability distribution of alignments and its application to parameter estimation of evolutionary trees and to optimization of multiple alignments , 1994, Journal of Molecular Evolution.

[6]  Mike A. Steel,et al.  Applying the Thorne-Kishino-Felsenstein model to sequence evolution on a star-shaped tree , 2001, Appl. Math. Lett..

[7]  J. Felsenstein,et al.  Inching toward reality: An improved likelihood model of sequence evolution , 2004, Journal of Molecular Evolution.

[8]  István Miklós Algorithm for statistical alignment of two sequences derived from a Poisson sequence length distribution , 2003, Discret. Appl. Math..

[9]  Ian Holmes,et al.  Evolutionary HMMs: a Bayesian approach to multiple alignment , 2001, Bioinform..

[10]  M. Bishop,et al.  Maximum likelihood alignment of DNA sequences. , 1986, Journal of molecular biology.

[11]  David Sankoff,et al.  Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison , 1983 .

[12]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[13]  Jun Zhu,et al.  Bayesian adaptive sequence alignment algorithms , 1998, Bioinform..

[14]  Radford M. Neal,et al.  Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation , 1995, Learning in Graphical Models.

[15]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[16]  Jotun Hein,et al.  An Algorithm for Statistical Alignment of Sequences Related by a Binary Tree , 2000, Pacific Symposium on Biocomputing.

[17]  G. Mitchison A Probabilistic Treatment of Phylogeny and Sequence Alignment , 1999, Journal of Molecular Evolution.