A model of evolution and structure for multiple sequence alignment

We have developed a phylogeny-aware progressive alignment method that recognizes insertions and deletions as distinct evolutionary events and thus avoids systematic errors created by traditional alignment methods. We now extend this method to simultaneously model regional heterogeneity and evolution. This novel method can be flexibly adapted to alignment of nucleotide or amino acid sequences evolving under processes that vary over genomic regions and, being fully probabilistic, provides an estimate of regional heterogeneity of the evolutionary process along the alignment and a measure of local reliability of the solution. Furthermore, the evolutionary modelling of substitution process permits adjusting the sensitivity and specificity of the alignment and, if high specificity is aimed at, leaving sequences unaligned when their divergence is beyond a meaningful detection of homology.

[1]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[2]  D. Metzler,et al.  Statistical Alignment with a Sequence Evolution Model Allowing Rate Heterogeneity along the Sequence , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  Richard Hughey,et al.  Hidden Markov models for detecting remote protein homologies , 1998, Bioinform..

[4]  J. Felsenstein,et al.  An evolutionary model for maximum likelihood alignment of DNA sequences , 1991, Journal of Molecular Evolution.

[5]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[6]  S. Jeffery Evolution of Protein Molecules , 1979 .

[7]  I. Holmes,et al.  Using guide trees to construct multiple-sequence evolutionary HMMs , 2003, ISMB.

[8]  P. Hogeweg,et al.  The alignment of sets of sequences and the construction of phyletic trees: An integrated method , 2005, Journal of Molecular Evolution.

[9]  Ari Löytynoja,et al.  An algorithm for progressive multiple alignment of sequences with insertions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[11]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[12]  Jens Stoye,et al.  Benchmarking tools for the alignment of functional noncoding DNA , 2004, BMC Bioinformatics.

[13]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[14]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[15]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[16]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[17]  Lior Pachter,et al.  Combining statistical alignment and phylogenetic footprinting to detect regulatory elements , 2008, Bioinform..

[18]  Kimmen Sjölander,et al.  SATCHMO: Sequence Alignment and Tree Construction Using Hidden Markov Models , 2003, Bioinform..

[19]  R. Nielsen,et al.  Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. , 1998, Genetics.

[20]  Jens Ledet Jensen,et al.  Recursions for statistical multiple alignment , 2003, Proceedings of the National Academy of Sciences of the United States of America.