Probabilistic Methods and Rate Heterogeneity

Evolutionary forces such as mutation, drift, and to a certain extent selection are stochastic in their nature. It is thus not surprising that probabilistic Models of sequence evolution quickly became the workhorse of molecular evolution research. The long, ongoing effort to accurately model sequence evolution stems from two different needs. The first is that of evolutionary biologists: Models of sequence evolution allow us to test evolutionary hypotheses to reconstruct phylogenetic trees and ancestral sequences [1–3]. The second is that of bioinformaticians and system biologists— probabilistic/evolutionary methods are critical components in numerous applications. For example, the construction of similarity networks is based upon all-against-all homology searches. Each pairwise evaluation is done using tools such as Blast and Blat [4, 5], which rely on evolutionary models. Additional examples include gene finding and genome annotation [6], alignment algorithms [7, 8], detecting genomic regions of high and low conservation [9, 10], prediction of transcription factor-binding sites [11], function prediction [12], and protein networks analysis [13, 14]. In this chapter, we describe how probabilistic models are used to study

[1]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[2]  Adi Stern,et al.  A likelihood framework to analyse phyletic patterns , 2008, Philosophical Transactions of the Royal Society B: Biological Sciences.

[3]  Kazutaka Katoh,et al.  Recent developments in the MAFFT multiple sequence alignment program , 2008, Briefings Bioinform..

[4]  Ziheng Yang,et al.  Mutation-selection models of codon substitution and their use to estimate selective strengths on codon usage. , 2008, Molecular biology and evolution.

[5]  Marc A Suchard,et al.  A nonparametric method for accommodating and testing across-site rate variation. , 2007, Systematic biology.

[6]  Rodrigo Lopez,et al.  Clustal W and Clustal X version 2.0 , 2007, Bioinform..

[7]  E. Koonin,et al.  Patterns of intron gain and conservation in eukaryotic genes , 2007, BMC Evolutionary Biology.

[8]  Itay Mayrose,et al.  Towards realistic codon models: among site variability and dependency of synonymous and non-synonymous rates , 2007, ISMB/ECCB.

[9]  D. Liberles Ancestral sequence reconstruction , 2007 .

[10]  Tal Pupko,et al.  Phylogeny reconstruction: increasing the accuracy of pairwise distance estimation using Bayesian inference of evolutionary rates , 2007, Bioinform..

[11]  G. Ast,et al.  Comparative analysis identifies exonic splicing regulatory sequences--The complex definition of enhancers and silencers. , 2006, Molecular cell.

[12]  Adi Stern,et al.  An evolutionary space-time model with varying among-site dependencies. , 2006, Molecular biology and evolution.

[13]  Sarel J Fleishman,et al.  Transmembrane protein structures without X-rays. , 2006, Trends in biochemical sciences.

[14]  L. Hurst,et al.  Hearing silence: non-neutral evolution at synonymous sites in mammals , 2006, Nature Reviews Genetics.

[15]  M. Hasegawa,et al.  Model of amino acid substitution in proteins encoded by mitochondrial DNA , 1996, Journal of Molecular Evolution.

[16]  S. Muse,et al.  Site-to-site variation of synonymous substitution rates. , 2005, Molecular biology and evolution.

[17]  Michael I. Jordan,et al.  Protein Molecular Function Prediction by Bayesian Phylogenomics , 2005, PLoS Comput. Biol..

[18]  Gail J. Bartlett,et al.  Effective function annotation through catalytic residue conservation. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Francesca Chiaromonte,et al.  Evaluation of regulatory potential and conservation scores for detecting cis-regulatory modules in aligned mammalian genome sequences. , 2005, Genome research.

[20]  Itay Mayrose,et al.  ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures , 2005, Nucleic Acids Res..

[21]  Zaid Abdo,et al.  Evaluating the performance of a successive-approximations approach to parameter optimization in maximum-likelihood phylogeny estimation. , 2005, Molecular biology and evolution.

[22]  Simon D W Frost,et al.  A simple hierarchical approach to modeling distributions of substitution rates. , 2005, Molecular biology and evolution.

[23]  Itay Mayrose,et al.  A Gamma mixture model better accounts for among site rate heterogeneity , 2005, ECCB/JBI.

[24]  K. Holsinger,et al.  The effect of topology on estimates of among-site rate variation , 1996, Journal of Molecular Evolution.

[25]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[26]  Naruya Saitou,et al.  Property and efficiency of the maximum likelihood method for molecular phylogeny , 2005, Journal of Molecular Evolution.

[27]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[28]  T. Pupko,et al.  Site-Specific Evolutionary Rate Inference: Taking Phylogenetic Uncertainty into Account , 2005, Journal of Molecular Evolution.

[29]  Alan M. Moses,et al.  MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model , 2004, Genome Biology.

[30]  N. Ben-Tal,et al.  Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. , 2004, Molecular biology and evolution.

[31]  David L. Swofford,et al.  Are Guinea Pigs Rodents? The Importance of Adequate Models in Molecular Phylogenetics , 1997, Journal of Mammalian Evolution.

[32]  Ziheng Yang Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: Approximate methods , 1994, Journal of Molecular Evolution.

[33]  Chris Field,et al.  Estimation of rates-across-sites distributions in phylogenetic substitution models. , 2003, Systematic biology.

[34]  B. Snel,et al.  Function prediction and protein networks. , 2003, Current opinion in cell biology.

[35]  Jon D. McAuliffe,et al.  Phylogenetic Shadowing of Primate Sequences to Find Functional Regions of the Human Genome , 2003, Science.

[36]  Santiago F. Elena,et al.  A Sliding Window-Based Method to Detect Selective Constraints in Protein-Coding Genes and Its Application to RNA Viruses , 2002, Journal of Molecular Evolution.

[37]  Kenneth Lange,et al.  Applications of codon and rate variation models in molecular phylogeny. , 2002, Molecular biology and evolution.

[38]  Kenneth Lange,et al.  Codon and rate variation models in molecular phylogeny. , 2002, Molecular biology and evolution.

[39]  L. Hurst The Ka/Ks ratio: diagnosing the form of sequence evolution. , 2002, Trends in genetics : TIG.

[40]  Y. Inagaki,et al.  Testing for differences in rates-across-sites distributions in phylogenetic subtrees. , 2002, Molecular biology and evolution.

[41]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[42]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[43]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[44]  Joseph Felsenstein,et al.  Taking Variation of Evolutionary Rates Between Sites into Account in Inferring Phylogenies , 2001, Journal of Molecular Evolution.

[45]  D. Swofford,et al.  Should we use model-based methods for phylogenetic inference when we know that assumptions about among-site rate variation and nucleotide substitution pattern are violated? , 2001, Systematic biology.

[46]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[47]  P. Lio’,et al.  Molecular phylogenetics: state-of-the-art methods for looking into the past. , 2001, Trends in genetics : TIG.

[48]  N. Goldman,et al.  Codon-substitution models for heterogeneous selection pressure at amino acid sites. , 2000, Genetics.

[49]  G. Stormo Gene-finding approaches for eukaryotes. , 2000, Genome research.

[50]  P. Waddell,et al.  Plastid Genome Phylogeny and a Model of Amino Acid Substitution for Proteins Encoded by Chloroplast DNA , 2000, Journal of Molecular Evolution.

[51]  F J Ayala,et al.  A new method for characterizing replacement rate variation in molecular sequences. Application of the Fourier and wavelet models to Drosophila and mammalian proteins. , 2000, Genetics.

[52]  M. Sogin,et al.  Phylogeny of the genera Entamoeba and Endolimax as deduced from small-subunit ribosomal RNA sequences. , 1999, Molecular biology and evolution.

[53]  D. Swofford,et al.  The Effect of Taxon Sampling on Estimating Rate Heterogeneity Parameters of Maximum-Likelihood Models , 1999 .

[54]  R. Nielsen,et al.  Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. , 1998, Genetics.

[55]  David Posada,et al.  MODELTEST: testing the model of DNA substitution , 1998, Bioinform..

[56]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[57]  R. Nielsen,et al.  Site-by-site estimation of the rate of substitution and the correlation of rates in mitochondrial DNA. , 1997, Systematic biology.

[58]  Z. Yang,et al.  Among-site rate variation and its impact on phylogenetic analyses. , 1996, Trends in ecology & evolution.

[59]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[60]  W. Li,et al.  Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites. , 1995, Molecular biology and evolution.

[61]  Z. Yang,et al.  Mixed model analysis of DNA sequence evolution. , 1995, Biometrics.

[62]  Z. Yang,et al.  A space-time process model for the evolution of DNA sequences. , 1995, Genetics.

[63]  S. Muse,et al.  A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. , 1994, Molecular biology and evolution.

[64]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[65]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[66]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[67]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[68]  D. Mindell Fundamentals of molecular evolution , 1991 .

[69]  J. Oliver,et al.  The general stochastic model of nucleotide substitution. , 1990, Journal of theoretical biology.

[70]  L. Jin,et al.  Limitations of the evolutionary parsimony method of phylogenetic analysis. , 1990, Molecular biology and evolution.

[71]  N. Saitou,et al.  The neighbor-joining method: a new method for reconstructing phylogenetic trees. , 1987, Molecular biology and evolution.

[72]  G. Olsen,et al.  Earliest phylogenetic branchings: comparing rRNA-based evolutionary trees inferred with various techniques. , 1987, Cold Spring Harbor symposia on quantitative biology.

[73]  M. Nei,et al.  Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. , 1986, Molecular biology and evolution.

[74]  S. Jeffery Evolution of Protein Molecules , 1979 .

[75]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[76]  W. Fitch Toward Defining the Course of Evolution: Minimum Change for a Specific Tree Topology , 1971 .

[77]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .