The estimation of relative site variability among aligned homologous protein sequences

MOTIVATION Maximum likelihood-based methods to estimate site by site substitution rate variability in aligned homologous protein sequences rely on the formulation of a phylogenetic tree and generally assume that the patterns of relative variability follow a pre-determined distribution. We present a phylogenetic tree-independent method to estimate the relative variability of individual sites within large datasets of homologous protein sequences. It is based upon two simple assumptions. Firstly that substitutions observed between two closely related sequences are likely, in general, to occur at the most variable sites. Secondly that non-conservative amino acid substitutions tend to occur at more variable sites. Our methodology makes no assumptions regarding the underlying pattern of relative variability between sites. RESULTS We have compared, using data simulated under a non-gamma distributed model, the performance of this approach to that of a maximum likelihood method that assumes gamma distributed rates. At low mean rates of evolution our method inferred site by site relative substitution rates more accurately than the maximum likelihood approach in the absence of prior assumptions about the relationships between sequences. Our method does not directly account for the effects of mutational saturation, However, we have incorporated an 'ad-hoc' modification that allows the accurate estimation of relative site variability in fast evolving and saturated datasets.

[1]  Thomas Uzzell,et al.  Fitting Discrete Probability Distributions to Evolutionary Events , 1971, Science.

[2]  G. Pesole,et al.  A novel method for estimating substitution rate variation among sites in a large dataset of homologous DNA sequences. , 2001, Genetics.

[3]  Michael P. Cummings,et al.  PAUP* [Phylogenetic Analysis Using Parsimony (and Other Methods)] , 2004 .

[4]  Z. Yang,et al.  Among-site rate variation and its impact on phylogenetic analyses. , 1996, Trends in ecology & evolution.

[5]  Y. Inagaki,et al.  Testing for differences in rates-across-sites distributions in phylogenetic subtrees. , 2002, Molecular biology and evolution.

[6]  F. Ayala,et al.  This paper was presented at a colloquium entitled ‘ ‘ Genetics and the Origin of Species , ’ ’ organized , 1997 .

[7]  G. H. Coombs,et al.  Evolutionary relationships among protozoa. , 1998 .

[8]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[9]  H. Philippe,et al.  Heterotachy, an important process of protein evolution. , 2002, Molecular biology and evolution.

[10]  Tal Pupko,et al.  A covarion-based method for detecting molecular adaptation: application to the evolution of primate mitochondrial genomes , 2002, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[11]  S. Otto,et al.  The evolution of gene duplicates. , 2002, Advances in genetics.

[12]  M. Gouy,et al.  Accounting for evolutionary rate variation among sequence sites consistently changes universal phylogenies deduced from rRNA and protein-coding genes. , 1999, Molecular phylogenetics and evolution.

[13]  D Penny,et al.  Evolution of chlorophyll and bacteriochlorophyll: the problem of invariant sites in sequence analysis. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[14]  J. Zhang,et al.  A simple method for estimating the parameter of substitution rate variation among sites. , 1997, Molecular biology and evolution.

[15]  D. Horner,et al.  Iron hydrogenases and the evolution of anaerobic eukaryotes. , 2000, Molecular biology and evolution.

[16]  Xun Gu,et al.  DIVERGE: phylogeny-based analysis for functional-structural divergence of a protein family , 2002, Bioinform..

[17]  K. Holsinger,et al.  The effect of topology on estimates of among-site rate variation , 1996, Journal of Molecular Evolution.

[18]  Joseph Felsenstein,et al.  Taking Variation of Evolutionary Rates Between Sites into Account in Inferring Phylogenies , 2001, Journal of Molecular Evolution.

[19]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .