Co-evolution and information signals in biological sequences

The information content of a pool of sequences has been defined in information theory through enthropic measures aimed to capture the amount of variability within sequences. When dealing with biological sequences coding for proteins, a first approach is to align these sequences to estimate the probability of each amino-acid to occur within alignment positions and to combine these values through an ''entropy'' function whose minimum corresponds to the case where for each position, each amino-acid has the same probability to occur. This model is too restrictive when the purpose is to evaluate sequence constraints that have to be conserved to maintain the function of the proteins under random mutations. In fact, co-evolution of amino-acids appearing in pairs or tuplets of positions in sequences constitutes a fine signal of important structural, functional and mechanical information for protein families. It is clear that classical information theory should be revisited when applied to biological data. A large number of approaches to co-evolution of biological sequences have been developed in the last decade. We present a few of them, discuss their limitations and some related questions, like the generation of random structures to validate predictions based on co-evolution, which appear crucial for new advances in structural bioinformatics.

[1]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[2]  Gregory B. Gloor,et al.  Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction , 2008, Bioinform..

[3]  L. C. Martin,et al.  Using information theory to search for co-evolving residues in proteins , 2005, Bioinform..

[4]  F. Cohen,et al.  Co-evolution of proteins with their interaction partners. , 2000, Journal of molecular biology.

[5]  H. Akashi,et al.  Within- and between-species DNA sequence variation and the 'footprint' of natural selection. , 1999, Gene.

[6]  James M. Carothers,et al.  Informational Complexity and Functional Activity of RNA Structures , 2004, Journal of the American Chemical Society.

[7]  D. Thirumalai,et al.  Determination of network of residues that regulate allostery in protein families using sequence analysis , 2006, Protein science : a publication of the Protein Society.

[8]  P. Sharp,et al.  In search of molecular darwinism , 1997, Nature.

[9]  R. Nussinov,et al.  Residues crucial for maintaining short paths in network communication mediate signaling in proteins , 2006, Molecular systems biology.

[10]  D. Haydon,et al.  Evidence for positive selection in foot-and-mouth disease virus capsid genes from field isolates. , 2001, Genetics.

[11]  Ziheng Yang,et al.  Adaptive Molecular Evolution , 2004, Handbook of Statistical Genomics.

[12]  D. Higgins,et al.  Bioinformatics : sequence, structure, and databanks , 2000 .

[13]  Alessandra Carbone,et al.  Joint Evolutionary Trees: A Large-Scale Method To Predict Protein Interfaces Based on Sequence Sampling , 2009, PLoS Comput. Biol..

[14]  E. Holmes,et al.  Genealogical evidence for positive selection in the nef gene of HIV-1. , 1999, Genetics.

[15]  C. Notredame,et al.  Recent progress in multiple sequence alignment: a survey. , 2002, Pharmacogenomics.

[16]  Maria Anisimova,et al.  The accuracy and power of likelihood ratio tests to detect positive selection at amino acid sites , 2001 .

[17]  A. Valencia,et al.  In silico two‐hybrid system for the selection of physically interacting protein pairs , 2002, Proteins.

[18]  T Gojobori,et al.  Large-scale search for genes on which positive selection may operate. , 1996, Molecular biology and evolution.

[19]  N. Goldman,et al.  Codon-substitution models for heterogeneous selection pressure at amino acid sites. , 2000, Genetics.

[20]  R. Ranganathan,et al.  Evolutionarily conserved pathways of energetic connectivity in protein families. , 1999, Science.

[21]  Mauricio G Mateu,et al.  Electrostatic repulsion, compensatory mutations, and long-range non-additive effects at the dimerization interface of the HIV capsid protein. , 2005, Journal of molecular biology.

[22]  C. Axel Innis,et al.  siteFiNDER|3D: a web-based tool for predicting the location of functional sites in proteins , 2007, Nucleic Acids Res..

[23]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[24]  R. Nielsen,et al.  Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. , 1998, Genetics.

[25]  K Schulten,et al.  VMD: visual molecular dynamics. , 1996, Journal of molecular graphics.

[26]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..

[27]  W. Taylor,et al.  Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. , 1997, Protein engineering.

[28]  David Haussler,et al.  Detecting Coevolution in and among Protein Domains , 2007, PLoS Comput. Biol..

[29]  A. Fersht,et al.  Mutually compensatory mutations during evolution of the tetramerization domain of tumor suppressor p53 lead to impaired hetero-oligomerization. , 1999, Proceedings of the National Academy of Sciences of the United States of America.

[30]  C. Yanofsky,et al.  Protein Structure Relationships Revealed by Mutational Analysis , 1964, Science.

[31]  Laurent Duret,et al.  Multiple alignments for structural functional or phylogenetic analyses of homologous sequences , 2000 .

[32]  H. Wolfson,et al.  Correlated mutations: Advances and limitations. A study on fusion proteins and on the Cohesin‐Dockerin families , 2006, Proteins.

[33]  D. Higgins,et al.  Multiple sequence alignments. , 2005, Current opinion in structural biology.

[34]  Tal Pupko,et al.  A branch-and-bound algorithm for the inference of ancestral amino-acid sequences when the replacement rate varies among sites: Application to the evolution of five gene families , 2002, Bioinform..

[35]  Cory L. Strope,et al.  indel-Seq-Gen: a new protein family simulator incorporating domains, motifs, and indels. , 2006, Molecular biology and evolution.

[36]  W. Atchley,et al.  Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. , 2000, Molecular biology and evolution.

[37]  Steven A Benner,et al.  Empirical analysis of protein insertions and deletions determining parameters for the correct placement of gaps in protein sequence alignments. , 2004, Journal of molecular biology.

[38]  Cédric Notredame,et al.  Recent Evolutions of Multiple Sequence Alignment Algorithms , 2007, PLoS Comput. Biol..

[39]  Alessandra Carbone,et al.  Information Content of Sets of Biological Sequences Revisited , 2009, Algorithmic Bioprocesses.

[40]  Ziheng Yang,et al.  Positive Darwinian selection drives the evolution of several female reproductive proteins in mammals , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[41]  S. Muse,et al.  Comparing patterns of nucleotide substitution rates among chloroplast loci using the relative ratio test. , 1997, Genetics.

[42]  Andrew Rambaut,et al.  Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees , 1997, Comput. Appl. Biosci..

[43]  J. Thornton,et al.  Predicting protein function from sequence and structural data. , 2005, Current opinion in structural biology.

[44]  Z. Yang,et al.  Positive and negative selection in the DAZ gene family. , 2001, Molecular biology and evolution.

[45]  W. Fitch,et al.  An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution , 1970, Biochemical Genetics.

[46]  V. Viasnoff,et al.  Encoding folding paths of RNA switches , 2006, Nucleic acids research.

[47]  W. Atchley,et al.  Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[48]  A. Valencia,et al.  Correlated mutations contain information about protein-protein interaction. , 1997, Journal of molecular biology.

[49]  Z. Weng,et al.  Structure, function, and evolution of transient and obligate protein-protein interactions. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[50]  R. Aldrich,et al.  Influence of conservation on calculations of amino acid covariance in multiple sequence alignments , 2004, Proteins.

[51]  Richard W. Aldrich,et al.  A perturbation-based method for calculating explicit likelihood of evolutionary co-variance in multiple sequence alignments , 2004, Bioinform..

[52]  J. D. Thompson,et al.  Multiple alignment of complete sequences (MACS) in the post-genomic era. , 2001, Gene.

[53]  K. Crandall,et al.  Parallel evolution of drug resistance in HIV: failure of nonsynonymous/synonymous substitution rate ratio to detect selection. , 1999, Molecular biology and evolution.

[54]  S. Muse,et al.  A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome. , 1994, Molecular biology and evolution.

[55]  David A Agard,et al.  Intramolecular signaling pathways revealed by modeling anisotropic thermal diffusion. , 2005, Journal of molecular biology.

[56]  Z. Yang,et al.  Maximum-likelihood analysis of molecular adaptation in abalone sperm lysin reveals variable selective pressures among lineages and sites. , 2000, Molecular biology and evolution.

[57]  Mario A. Fares,et al.  CAPS: coevolution analysis using protein sequences , 2006, Bioinform..

[58]  D. Baker,et al.  Improvement in protein functional site prediction by distinguishing structural and functional constraints on protein family evolution using computational design , 2005, Nucleic acids research.

[59]  M. Ford,et al.  Molecular evolution of transferrin: evidence for positive selection in salmonids. , 2001, Molecular biology and evolution.

[60]  A Carbone,et al.  Periodic distributions of hydrophobic amino acids allows the definition of fundamental building blocks to align distantly related proteins , 2007, Proteins.

[61]  Arun K. Ramani,et al.  Exploiting the co-evolution of interacting proteins to discover interaction specificity. , 2003, Journal of molecular biology.

[62]  N. Ben-Tal,et al.  ConSurf: an algorithmic tool for the identification of functional regions in proteins by surface mapping of phylogenetic information. , 2001, Journal of molecular biology.

[63]  J. I. Izpisúa Belmonte,et al.  Global DNA methylation and transcriptional analyses of human ESC-derived cardiomyocytes , 2014, Protein & Cell.

[64]  A. Moya,et al.  Evidence for positive selection in the capsid protein-coding region of the foot-and-mouth disease virus (FMDV) subjected to experimental passage regimens. , 2001, Molecular biology and evolution.

[65]  J G Bishop,et al.  Rapid evolution in plant chitinases: molecular targets of selection in plant-pathogen coevolution. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[66]  N. Goldman,et al.  A codon-based model of nucleotide substitution for protein-coding DNA sequences. , 1994, Molecular biology and evolution.

[67]  G. Gloor,et al.  Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions. , 2005, Biochemistry.

[68]  J. Gonzalez,et al.  Scoring docking models with evolutionary information , 2005, Proteins.

[69]  Art Poon,et al.  The Rate of Compensatory Mutation in the DNA Bacteriophage φX174 , 2005, Genetics.

[70]  O. Lichtarge,et al.  Evolutionary predictions of binding surfaces and interactions. , 2002, Current opinion in structural biology.

[71]  Simon A. A. Travers,et al.  A Novel Method for Detecting Intramolecular Coevolution: Adding a Further Dimension to Selective Constraints Analyses , 2006, Genetics.

[72]  Peter J Bickel,et al.  Finding important sites in protein sequences , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[73]  Valeria Souza,et al.  The Interaction of Protein Structure, Selection, and Recombination on the Evolution of the Type-1 Fimbrial Major Subunit (fimA) from Escherichia coli , 2001, Journal of Molecular Evolution.

[74]  Alessandra Carbone,et al.  A Combinatorial Approach to Detect Coevolved Amino Acid Networks in Protein Families of Variable Divergence , 2009, PLoS Comput. Biol..

[75]  M. Nei,et al.  Pattern of nucleotide substitution at major histocompatibility complex class I loci reveals overdominant selection , 1988, Nature.

[76]  Ziheng Yang,et al.  Statistical methods for detecting molecular adaptation , 2000, Trends in Ecology & Evolution.

[77]  A. Horovitz,et al.  Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations , 2002, Proteins.

[78]  Ziheng Yang,et al.  Positive Darwinian selection in the evolution of mammalian female reproductive proteins , 2001 .

[79]  Z. Yang,et al.  Accuracy and power of the likelihood ratio test in detecting adaptive molecular evolution. , 2001, Molecular biology and evolution.

[80]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[81]  Mark Pagel,et al.  Predicting Functional Gene Links from Phylogenetic-Statistical Analyses of Whole Genomes , 2005, 2005 IEEE Computational Systems Bioinformatics Conference - Workshops (CSBW'05).

[82]  Thomas W. H. Lui,et al.  Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments , 2003, Bioinform..

[83]  C. Adami,et al.  Physical complexity of symbolic sequences , 1996, adap-org/9605002.