ConSurf 2016: an improved methodology to estimate and visualize evolutionary conservation in macromolecules

The degree of evolutionary conservation of an amino acid in a protein or a nucleic acid in DNA/RNA reflects a balance between its natural tendency to mutate and the overall need to retain the structural integrity and function of the macromolecule. The ConSurf web server (http://consurf.tau.ac.il), established over 15 years ago, analyses the evolutionary pattern of the amino/nucleic acids of the macromolecule to reveal regions that are important for structure and/or function. Starting from a query sequence or structure, the server automatically collects homologues, infers their multiple sequence alignment and reconstructs a phylogenetic tree that reflects their evolutionary relations. These data are then used, within a probabilistic framework, to estimate the evolutionary rates of each sequence position. Here we introduce several new features into ConSurf, including automatic selection of the best evolutionary model used to infer the rates, the ability to homology-model query proteins, prediction of the secondary structure of query RNA molecules from sequence, the ability to view the biological assembly of a query (in addition to the single chain), mapping of the conservation grades onto 2D RNA models and an advanced view of the phylogenetic tree that enables interactively rerunning ConSurf with the taxa of a sub-tree.

[1]  M. Hasegawa,et al.  Model of amino acid substitution in proteins encoded by mitochondrial DNA , 1996, Journal of Molecular Evolution.

[2]  T. Blundell,et al.  Comparative protein modelling by satisfaction of spatial restraints. , 1993, Journal of molecular biology.

[3]  Itay Mayrose,et al.  ConSurf: Using Evolutionary Data to Raise Testable Hypotheses about Protein Function , 2013 .

[4]  Lucy R Forrest,et al.  The bacterial dicarboxylate transporter, VcINDY, uses a two-domain elevator-type mechanism , 2016, Nature Structural &Molecular Biology.

[5]  Geoffrey Brian Golding,et al.  FuncPatch: a web server for the fast Bayesian inference of conserved functional patches in protein 3D structures , 2015, Bioinform..

[6]  Fredrik Johansson,et al.  A comparative study of conservation and variation scores , 2010, BMC Bioinformatics.

[7]  Gáspár Jékely,et al.  Did the last common ancestor have a biological membrane? , 2006, Biology Direct.

[8]  Geoffrey Brian Golding,et al.  Phylogenetic Gaussian Process Model for the Inference of Functionally Important Regions in Protein Tertiary Structures , 2014, PLoS Comput. Biol..

[9]  F E Cohen,et al.  Identification of functional surfaces of the zinc binding domains of intracellular receptors. , 1997, Journal of molecular biology.

[10]  Itay Mayrose,et al.  ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures , 2005, Nucleic Acids Res..

[11]  N. Ben-Tal,et al.  Comparison of site-specific rate-inference methods for protein sequences: empirical Bayesian methods are superior. , 2004, Molecular biology and evolution.

[12]  H. Akaike A new look at the statistical model identification , 1974 .

[13]  F E Cohen,et al.  Evolutionarily conserved Galphabetagamma binding surfaces support a model of the G protein-receptor complex. , 1996, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Mona Singh,et al.  Predicting functionally important residues from sequence conservation , 2007, Bioinform..

[15]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[16]  Itay Mayrose,et al.  A Gamma mixture model better accounts for among site rate heterogeneity , 2005, ECCB/JBI.

[17]  S. Whelan,et al.  A general empirical model of protein evolution derived from multiple protein families using a maximum-likelihood approach. , 2001, Molecular biology and evolution.

[18]  Ramón Doallo,et al.  ProtTest 3: fast selection of best-fit models of protein evolution , 2011, Bioinform..

[19]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[20]  A. Biegert,et al.  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment , 2011, Nature Methods.

[21]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[22]  M. O. Dayhoff A model of evolutionary change in protein , 1978 .

[23]  David S. Goodsell,et al.  The RCSB Protein Data Bank: views of structural biology for basic and applied research and education , 2014, Nucleic Acids Res..

[24]  S. Jeffery Evolution of Protein Molecules , 1979 .

[25]  A. Thomas,et al.  A fast method to predict protein interaction sites from sequences. , 2000, Journal of molecular biology.

[26]  H. Kishino,et al.  Dating of the human-ape splitting by a molecular clock of mitochondrial DNA , 2005, Journal of Molecular Evolution.

[27]  Doron Gerber,et al.  Computational identification of natural peptides based on analysis of molecular evolution , 2014, Bioinform..

[28]  A. Valencia,et al.  Automatic methods for predicting functionally important residues. , 2003, Journal of molecular biology.

[29]  Piero Fariselli,et al.  ConSeq: the identification of functionally and structurally important residues in protein sequences , 2004, Bioinform..

[30]  Tal Pupko,et al.  In silico identification of functional regions in proteins , 2005, ISMB.

[31]  Adam Godzik,et al.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences , 2006, Bioinform..

[32]  Nir Ben-Tal,et al.  FUNCTIONAL EVALUATION OF AUTISM-ASSOCIATED MUTATIONS IN NHE9 , 2013, Nature Communications.

[33]  B. Rost Twilight zone of protein sequence alignments. , 1999, Protein engineering.

[34]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[35]  Tal Pupko,et al.  ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids , 2010, Nucleic Acids Res..

[36]  David H Mathews,et al.  Identification of the determinants of tRNA function and susceptibility to rapid tRNA decay by high-throughput in vivo analysis , 2014, Genes & development.

[37]  M. Nei,et al.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. , 2011, Molecular biology and evolution.

[38]  K. Crandall,et al.  Selecting the best-fit model of nucleotide substitution. , 2001, Systematic biology.

[39]  T. Pupko,et al.  Site-Specific Evolutionary Rate Inference: Taking Phylogenetic Uncertainty into Account , 2005, Journal of Molecular Evolution.

[40]  A. Biegert,et al.  Sequence context-specific profiles for homology searching , 2009, Proceedings of the National Academy of Sciences.

[41]  Johannes Söding,et al.  Automatic Prediction of Protein 3D Structures by Probabilistic Multi-template Homology Modeling , 2015, PLoS Comput. Biol..

[42]  O. Lichtarge,et al.  A family of evolution-entropy hybrid methods for ranking protein residues by importance. , 2004, Journal of molecular biology.

[43]  D. Eisenberg,et al.  Three-dimensional cluster analysis identifies interfaces and functional residue clusters in proteins. , 2001, Journal of molecular biology.

[44]  Tal Pupko,et al.  Structural Genomics , 2005 .

[45]  S. Tavaré Some probabilistic and statistical problems in the analysis of DNA sequences , 1986 .

[46]  Wayne A Hendrickson,et al.  Structure and activity of tryptophan-rich TSPO proteins , 2015, Science.

[47]  K. Tamura,et al.  Estimation of the number of nucleotide substitutions when there are strong transition-transversion and G+C-content biases. , 1992, Molecular biology and evolution.

[48]  Ramón Doallo,et al.  CircadiOmics: integrating circadian genomics, transcriptomics, proteomics and metabolomics , 2012, Nature Methods.

[49]  Alan Medlar,et al.  Wasabi: An Integrated Platform for Evolutionary Sequence Analysis and Data Visualization. , 2016, Molecular biology and evolution.

[50]  Peter B. McGarvey,et al.  UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..

[51]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[52]  D. Turner,et al.  Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[53]  Elisabeth R. M. Tillier,et al.  The accuracy of several multiple sequence alignment programs for proteins , 2006, BMC Bioinformatics.

[54]  E. Martz Introduction to proteins—structure, function, and motion , 2012 .

[55]  Tuck Seng Wong,et al.  Steering directed protein evolution: strategies to manage combinatorial complexity of mutant libraries. , 2007, Environmental microbiology.

[56]  O. Gascuel,et al.  An improved general amino acid replacement matrix. , 2008, Molecular biology and evolution.

[57]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[58]  P. Baldi,et al.  Prediction of coordination number and relative solvent accessibility in proteins , 2002, Proteins.

[59]  K. Henrick,et al.  Inference of macromolecular assemblies from crystalline state. , 2007, Journal of molecular biology.

[60]  Claus O. Wilke,et al.  Causes of evolutionary rate variation among protein sites , 2016, Nature Reviews Genetics.

[61]  J. Sussman,et al.  JSmol and the Next-Generation Web-Based Representation of 3D Molecular Structure as Applied to Proteopedia , 2013 .

[62]  Peter F. Stadler,et al.  ViennaRNA Package 2.0 , 2011, Algorithms for Molecular Biology.

[63]  M. O. Dayhoff,et al.  22 A Model of Evolutionary Change in Proteins , 1978 .

[64]  Nir Ben-Tal,et al.  Detection of functionally important regions in "hypothetical proteins" of known structure. , 2008, Structure.

[65]  Robert D. Finn,et al.  Rfam 12.0: updates to the RNA families database , 2014, Nucleic Acids Res..

[66]  P. Waddell,et al.  Plastid Genome Phylogeny and a Model of Amino Acid Substitution for Proteins Encoded by Chloroplast DNA , 2000, Journal of Molecular Evolution.

[67]  S. Steinberg,et al.  Importance of the reverse Hoogsteen base pair 54-58 for tRNA function. , 2003, Nucleic acids research.

[68]  D. Posada Selecting models of evolution , 2009 .

[69]  T. Jukes CHAPTER 24 – Evolution of Protein Molecules , 1969 .

[70]  John Kuriyan,et al.  Three-dimensional structure of the β subunit of E. coli DNA polymerase III holoenzyme: A sliding DNA clamp , 1992, Cell.

[71]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[72]  Zhengwei Zhu,et al.  CD-HIT: accelerated for clustering the next-generation sequencing data , 2012, Bioinform..

[73]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[74]  Phillip J. Stansfeld,et al.  Structural basis of outer membrane protein insertion by the BAM complex , 2016, Nature.

[75]  M. Salemi,et al.  The phylogenetic handbook : a practical approach to DNA and protein phylogeny , 2003 .

[76]  Subhajyoti De,et al.  Cellular crowding imposes global constraints on the chemistry and evolution of proteomes , 2012, Proceedings of the National Academy of Sciences.

[77]  Kimmen Sjölander,et al.  INTREPID: a web server for prediction of functionally important residues by evolutionary analysis , 2009, Nucleic Acids Res..