Phylogenetics In silico sequence evolution with site-specific interactions along phylogenetic trees

Motivation: A biological sequence usually has many sites whose evolution depends on other positions of the sequence, but this is not accounted for by commonly used models of sequence evolution. Here weintroduceaMarkovmodelofnucleotidesequenceevolutioninwhich the instantaneous substitution rate at a site depends on the states of othersites.Basedontheconceptofneighbourhoodsystems,ourmodel represents a universal description of arbitrarily complex dependencies among sites. Results: We show how to define complex models for some illustrative examples and demonstrate that our method provides a versatile resource for simulations of sequence evolution with site-specific inter-actionsalong atree.Forexample,weare ableto simulatetheevolution of RNA taking into account both secondary structure as well as pseudoknots and other tertiary interactions. To this end, we have developed a program Simulating Site-Specific Interactions (SISSI) that simulates evolution of a nucleotide sequence along a phylogenetic tree incorporating user defined site-specific interactions. Furthermore, our method allows to simulate more complex interactions among nucleotide and other character based sequences. Availability:

[1]  Sergei L. Kosakovsky Pond,et al.  HyPhy: hypothesis testing using phylogenies , 2005, Bioinform..

[2]  J. Echave,et al.  Generality of the structurally constrained protein evolution model: assessment on representatives of the four main fold classes. , 2005, Gene.

[3]  Irmtraud M. Meyer,et al.  An evolutionary model for protein-coding regions with conserved RNA structure. , 2004, Molecular biology and evolution.

[4]  H. Philippe,et al.  A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. , 2004, Molecular biology and evolution.

[5]  Thomas W H Lui,et al.  Empirical models for substitution in ribosomal RNA. , 2003, Molecular biology and evolution.

[6]  D. Haussler,et al.  Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. , 2003, Molecular biology and evolution.

[7]  David T. Jones,et al.  Protein evolution with dependence among codons due to tertiary structure. , 2003, Molecular biology and evolution.

[8]  J. Hein,et al.  Pfold: RNA secondary structure prediction using stochastic context-free grammars , 2003, Nucleic Acids Res..

[9]  P. Tufféry,et al.  CS-PSeq-Gen: Simulating the evolution of protein sequence under constraints , 2002, Bioinform..

[10]  P. Stadler,et al.  Secondary structure prediction for aligned RNA sequences. , 2002, Journal of molecular biology.

[11]  Christopher B. Burge,et al.  DNA sequence evolution with neighbor-dependent mutation , 2001, RECOMB '02.

[12]  J. Echave,et al.  Structural constraints and emergence of sequence patterns in protein evolution. , 2001, Molecular biology and evolution.

[13]  J. L. Jensen,et al.  A dependent-rates model and an MCMC-based methodology for the maximum-likelihood analysis of sequences with overlapping reading frames. , 2001, Molecular biology and evolution.

[14]  Paul Higgs,et al.  RNA Sequence Evolution with Secondary Structure Constraints , 2000 .

[15]  J. L. Jensen,et al.  Probabilistic models of DNA sequence evolution with context dependent rates of substitution , 2000, Advances in Applied Probability.

[16]  Gary D. Stormo,et al.  Phylogenetically enhanced statistical tools for RNA structure prediction , 2000, Bioinform..

[17]  R. Lück,et al.  ConStruct: a tool for thermodynamic controlled prediction of conserved secondary structure. , 1999, Nucleic acids research.

[18]  W R Taylor,et al.  Coevolving protein residues: maximum likelihood identification and relationship to structure. , 1999, Journal of molecular biology.

[19]  A. Halpern,et al.  Evolutionary distances for protein-coding sequences: modeling site-specific residue frequencies. , 1998, Molecular biology and evolution.

[20]  David C. Jones,et al.  Assessing the impact of secondary structure and solvent accessibility on protein evolution. , 1998, Genetics.

[21]  E. Tillier,et al.  High apparent rate of simultaneous compensatory base-pair substitutions in ribosomal RNA. , 1998, Genetics.

[22]  Folker Meyer,et al.  Rose: generating sequence families , 1998, Bioinform..

[23]  Gary D. Stormo,et al.  Displaying the information contents of structural RNA alignments: the structure logos , 1997, Comput. Appl. Biosci..

[24]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[25]  R A Goldstein,et al.  Mutation matrices and physical‐chemical properties: Correlations and implications , 1997, Proteins.

[26]  W. Bruno Modeling residue usage in aligned protein sequences via maximum likelihood. , 1996, Molecular biology and evolution.

[27]  C. Kundrot,et al.  Crystal Structure of a Group I Ribozyme Domain: Principles of RNA Packing , 1996, Science.

[28]  David C. Jones,et al.  Combining protein evolution and secondary structure. , 1996, Molecular biology and evolution.

[29]  A. Rzhetsky Estimating substitution rates in ribosomal RNA genes. , 1995, Genetics.

[30]  R A Goldstein,et al.  Context-dependent optimal substitution matrices. , 1995, Protein engineering.

[31]  J. Huelsenbeck Performance of Phylogenetic Methods in Simulation , 1995 .

[32]  S. Muse Evolutionary analyses of DNA sequences subject to constraints of secondary structure. , 1995, Genetics.

[33]  Elisabeth Renée,et al.  Maximum likelihood with multiparameter models of substitution , 1994, Journal of Molecular Evolution.

[34]  A. von Haeseler,et al.  A stochastic model for the evolution of autocorrelated DNA sequences. , 1994, Molecular phylogenetics and evolution.

[35]  Walter Fontana,et al.  Fast folding and comparison of RNA secondary structures , 1994 .

[36]  Z. Yang,et al.  Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. , 1993, Molecular biology and evolution.

[37]  G. Stormo,et al.  Identifying constraints on the higher-order structure of RNA: continued development and application of comparative sequence analysis methods. , 1992, Nucleic acids research.

[38]  David K. Y. Chiu,et al.  Inferring consensus structure from nucleic acid sequences , 1991, Comput. Appl. Biosci..

[39]  M. Kimura A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences , 1980, Journal of Molecular Evolution.

[40]  S. Jeffery Evolution of Protein Molecules , 1979 .

[41]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[42]  J. Felsenstein Evolutionary trees from DNA sequences: A maximum likelihood approach , 2005, Journal of Molecular Evolution.

[43]  Jotun Hein,et al.  A nucleotide substitution model with nearest-neighbour interactions , 2004, ISMB/ECCB.

[44]  D. Hoyle,et al.  RNA sequence evolution with secondary structure constraints: comparison of substitution rate models using maximum-likelihood methods. , 2001, Genetics.

[45]  Gary D. Stormo,et al.  An RNA folding method capable of identifying pseudoknots and base triples , 1998, Bioinform..

[46]  M. Schoniger,et al.  Evolution of DNA or Amino Acid Sequences with Dependent Sites , 1998, J. Comput. Biol..

[47]  James W. Brown The ribonuclease P database , 1998, Nucleic Acids Res..

[48]  Jun Adachi,et al.  PSeq-Gen: an application for the Monte Carlo simulation of protein sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[49]  A. von Haeseler,et al.  Simulating efficiently the evolution of DNA sequences. , 1995, Computer applications in the biosciences : CABIOS.

[50]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .