Models of molecular evolution and phylogeny.

Phylogenetic reconstruction is a fast-growing field that is enriched by different statistical approaches and by findings and applications in a broad range of biological areas. Fundamental to these are the mathematical models used to describe the patterns of DNA base substitution and amino acid replacement. These may become some of the basic models for comparative genome research. We discuss these models, including the analysis of observed DNA base and amino acid mutation patterns, the concept of site heterogeneity, and the incorporation of structural biology data, all of which have become particularly important in recent years. We also describe the use of such models in phylogenetic reconstruction and statistical methods for the comparison of different models.

[1]  L. Pauling,et al.  Evolutionary Divergence and Convergence in Proteins , 1965 .

[2]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[3]  V. Bryson,et al.  Evolving Genes and Proteins. , 1965, Science.

[4]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[5]  H. Akaike,et al.  Information Theory and an Extension of the Maximum Likelihood Principle , 1973 .

[6]  R. Grantham Amino Acid Difference Formula to Help Explain Protein Evolution , 1974, Science.

[7]  Joseph Felsenstein,et al.  The number of evolutionary trees , 1978 .

[8]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[9]  S. Jeffery Evolution of Protein Molecules , 1979 .

[10]  M. Kimura,et al.  A model of evolutionary base substitutions and its application with special reference to rapid change of pseudogenes. , 1981, Genetics.

[11]  G. Brown,et al.  Novel features of animal mtDNA evolution as shown by sequences of two rat cytochrome oxidase subunit II genes. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[12]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[13]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[14]  M. Nei,et al.  Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. , 1986, Molecular biology and evolution.

[15]  M. Nei Molecular Evolutionary Genetics , 1987 .

[16]  S. Ohno,et al.  Universal rule for coding sequence construction: TA/CG deficiency-TG/CT excess. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[18]  D C Shields,et al.  Chromosomal location and evolutionary rate variation in enterobacterial genes. , 1989, Science.

[19]  G. Churchill Stochastic models for heterogeneous DNA sequences. , 1989, Bulletin of mathematical biology.

[20]  G A Churchill,et al.  Methods for inferring phylogenies from nucleic acid sequence data by using maximum likelihood and linear invariants. , 1991, Molecular biology and evolution.

[21]  K. H. Wolfe,et al.  Mammalian DNA replication: mutation biases and the mutation rate. , 1991, Journal of theoretical biology.

[22]  S. Henikoff,et al.  Amino acid substitution matrices from protein blocks. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[23]  G. Gonnet,et al.  Exhaustive matching of the entire protein sequence database. , 1992, Science.

[24]  D. Moore,et al.  Ikaros, an early lymphoid-specific transcription factor and a putative mediator for T cell commitment. , 1992, Science.

[25]  J. Adachi,et al.  MOLPHY, programs for molecular phylogenetics , 1992 .

[26]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[27]  R. Fuchs,et al.  Greater susceptibility to mutations in lagging strand of DNA replication in Escherichia coli than in leading strand. , 1993, Science.

[28]  B S Weir,et al.  Analysis of DNA sequences , 1993, Statistical methods in medical research.

[29]  M. Nei,et al.  Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. , 1993, Molecular biology and evolution.

[30]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[31]  W. Brown,et al.  Rates and patterns of base change in the small subunit ribosomal RNA gene. , 1993, Genetics.

[32]  David C. Jones,et al.  A mutation data matrix for transmembrane proteins , 1994, FEBS letters.

[33]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[34]  A. von Haeseler,et al.  A stochastic model for the evolution of autocorrelated DNA sequences. , 1994, Molecular phylogenetics and evolution.

[35]  N. Goldman,et al.  Comparison of models for nucleotide substitution used in maximum-likelihood phylogenetic estimation. , 1994, Molecular biology and evolution.

[36]  J. Felsenstein,et al.  A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. , 1994, Molecular biology and evolution.

[37]  S. Muse,et al.  A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. , 1994, Molecular biology and evolution.

[38]  M. Pagel Detecting correlated evolution on phylogenies: a general method for the comparative analysis of discrete characters , 1994, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[39]  Nick Goldman,et al.  MAXIMUM LIKELIHOOD TREES FROM DNA SEQUENCES: A PECULIAR STATISTICAL ESTIMATION PROBLEM , 1995 .

[40]  J. Huelsenbeck The robustness of two phylogenetic methods: four-taxon simulations reveal a slight superiority of maximum likelihood over neighbor joining. , 1995, Molecular biology and evolution.

[41]  Z. Yang,et al.  A space-time process model for the evolution of DNA sequences. , 1995, Genetics.

[42]  A. Rzhetsky Estimating substitution rates in ribosomal RNA genes. , 1995, Genetics.

[43]  J. Felsenstein,et al.  A Hidden Markov Model approach to variation among sites in rate of evolution. , 1996, Molecular biology and evolution.

[44]  Z. Yang,et al.  Among-site rate variation and its impact on phylogenetic analyses. , 1996, Trends in ecology & evolution.

[45]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[46]  Peter Green,et al.  Markov chain Monte Carlo in Practice , 1996 .

[47]  David C. Jones,et al.  Combining protein evolution and secondary structure. , 1996, Molecular biology and evolution.

[48]  K. Strimmer,et al.  Quartet Puzzling: A Quartet Maximum-Likelihood Method for Reconstructing Tree Topologies , 1996 .

[49]  David C. Jones,et al.  Using evolutionary trees in protein secondary structure prediction and other comparative sequence analyses. , 1996, Journal of molecular biology.

[50]  B. Rannala,et al.  Phylogenetic methods come of age: testing hypotheses in an evolutionary context. , 1997, Science.

[51]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[52]  W. Brown,et al.  Structural biology and phylogenetic estimation , 1997, Nature.

[53]  K. E. Omland CORRELATED RATES OF MOLECULAR AND MORPHOLOGICAL EVOLUTION , 1997, Evolution; international journal of organic evolution.

[54]  B. Rannala,et al.  Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method. , 1997, Molecular biology and evolution.

[55]  Lars Arvestad,et al.  Estimation of Reversible Substitution Matrices from Multiple Pairs of Sequences , 1997, Journal of Molecular Evolution.

[56]  C. Wiuf,et al.  A codon-based model designed to describe lentiviral evolution. , 1998, Molecular biology and evolution.

[57]  Z. Yang On the best evolutionary rate for phylogenetic analysis. , 1998, Systematic biology.

[58]  Nick Goldman,et al.  Phylogenetic information and experimental design in molecular systematics , 1998, Proceedings of the Royal Society of London. Series B: Biological Sciences.

[59]  Zih E N G Ya N,et al.  On the Best Evolutionary Rate for Phylogenetic Analysis , 1998 .

[60]  David C. Jones,et al.  Assessing the impact of secondary structure and solvent accessibility on protein evolution. , 1998, Genetics.

[61]  Pietro Liò,et al.  PASSML: combining evolutionary inference and protein secondary structure prediction , 1998, Bioinform..

[62]  A. Graybeal,et al.  Is it better to add taxa or characters to a difficult phylogenetic problem? , 1998, Systematic biology.

[63]  E. Tillier,et al.  High apparent rate of simultaneous compensatory base-pair substitutions in ribosomal RNA. , 1998, Genetics.

[64]  M A Newton,et al.  Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods , 1999, Biometrics.