Integrated Assessment of Genomic Correlates of Protein Evolutionary Rate

Rates of evolution differ widely among proteins, but the causes and consequences of such differences remain under debate. With the advent of high-throughput functional genomics, it is now possible to rigorously assess the genomic correlates of protein evolutionary rate. However, dissecting the correlations among evolutionary rate and these genomic features remains a major challenge. Here, we use an integrated probabilistic modeling approach to study genomic correlates of protein evolutionary rate in Saccharomyces cerevisiae. We measure and rank degrees of association between (i) an approximate measure of protein evolutionary rate with high genome coverage, and (ii) a diverse list of protein properties (sequence, structural, functional, network, and phenotypic). We observe, among many statistically significant correlations, that slowly evolving proteins tend to be regulated by more transcription factors, deficient in predicted structural disorder, involved in characteristic biological functions (such as translation), biased in amino acid composition, and are generally more abundant, more essential, and enriched for interaction partners. Many of these results are in agreement with recent studies. In addition, we assess information contribution of different subsets of these protein properties in the task of predicting slowly evolving proteins. We employ a logistic regression model on binned data that is able to account for intercorrelation, non-linearity, and heterogeneity within features. Our model considers features both individually and in natural ensembles (“meta-features”) in order to assess joint information contribution and degree of contribution independence. Meta-features based on protein abundance and amino acid composition make strong, partially independent contributions to the task of predicting slowly evolving proteins; other meta-features make additional minor contributions. The combination of all meta-features yields predictions comparable to those based on paired species comparisons, and approaching the predictive limit of optimal lineage-insensitive features. Our integrated assessment framework can be readily extended to other correlational analyses at the genome scale.

[1]  A. Lesk,et al.  The relation between the divergence of sequence and structure in proteins. , 1986, The EMBO journal.

[2]  C. Pál,et al.  Highly expressed genes in yeast evolve slowly. , 2001, Genetics.

[3]  A. E. Hirsh,et al.  Adjusting for selection on synonymous sites in estimates of evolutionary distance. , 2005, Molecular biology and evolution.

[4]  Seong-Ho Kim,et al.  Predicted Functional RNAs within Coding Regions Constrain Evolutionary Rates of Yeast Proteins , 2008, PloS one.

[5]  David James Sherman,et al.  Génolevures: comparative genomics and molecular evolution of hemiascomycetous yeasts , 2004, Nucleic Acids Res..

[6]  C. Adami,et al.  Apparent dependence of protein evolutionary rate on number of interactions is linked to biases in protein–protein interactions data sets , 2003, BMC Evolutionary Biology.

[7]  A. Krogh,et al.  Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. , 2001, Journal of molecular biology.

[8]  S. Yi,et al.  Understanding relationship between sequence and functional evolution in yeast proteins , 2007, Genetica.

[9]  A. E. Hirsh,et al.  Protein dispensability and rate of evolution , 2001, Nature.

[10]  Sven Bergmann,et al.  Rewiring of the Yeast Transcriptional Network Through the Evolution of Motif Usage , 2005, Science.

[11]  Christopher J. Oldfield,et al.  Evolutionary Rate Heterogeneity in Proteins with Long Disordered Regions , 2002, Journal of Molecular Evolution.

[12]  D. Graur Amino acid composition and the evolutionary rates of protein-coding genes , 2005, Journal of Molecular Evolution.

[13]  B. Birren,et al.  Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae , 2004, Nature.

[14]  D. Sankoff,et al.  Comparable rates of gene loss and functional divergence after genome duplications early in vertebrate evolution. , 1997, Genetics.

[15]  M. Gerstein,et al.  Assessing the limits of genomic data integration for predicting protein networks. , 2005, Genome research.

[16]  A. E. Hirsh,et al.  Evolutionary Rate in the Protein Interaction Network , 2002, Science.

[17]  Eduardo P C Rocha,et al.  The quest for the universals of protein evolution. , 2006, Trends in genetics : TIG.

[18]  Bernardo Lemos,et al.  Evolution of proteins and gene expression levels are coupled in Drosophila and are independently associated with mRNA abundance, protein length, and number of protein-protein interactions. , 2005, Molecular biology and evolution.

[19]  M. Gerstein,et al.  Genomic analysis of essentiality within protein networks. , 2004, Trends in genetics : TIG.

[20]  Michael R. Seringhaus,et al.  Predicting essential genes in fungal genomes. , 2006, Genome research.

[21]  Bernard F. Buxton,et al.  The DISOPRED server for the prediction of protein disorder , 2004, Bioinform..

[22]  A. Wagner The yeast protein interaction network evolves rapidly and contains few redundant duplicate genes. , 2001, Molecular biology and evolution.

[23]  N. Friedman,et al.  Natural history and evolutionary principles of gene duplication in fungi , 2007, Nature.

[24]  Z. Yang,et al.  Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. , 2000, Molecular biology and evolution.

[25]  David Botstein,et al.  SGD: Saccharomyces Genome Database , 1998, Nucleic Acids Res..

[26]  William R. Taylor,et al.  The rapid generation of mutation data matrices from protein sequences , 1992, Comput. Appl. Biosci..

[27]  Eduardo P C Rocha,et al.  An analysis of determinants of amino acids substitution rates in bacterial proteins. , 2004, Molecular biology and evolution.

[28]  Eugene V Koonin,et al.  Duplicated genes evolve slower than singletons despite the initial rate increase , 2004, BMC Evolutionary Biology.

[29]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[30]  W. Li,et al.  Selective constraints, amino acid composition, and the rate of protein evolution. , 2000, Molecular biology and evolution.

[31]  Adam Eyre-Walker,et al.  Molecular Evolution by Wen-Hsiung Li. Published by Sinauer Associates, Sunderland, MA, USA. ISBN: 0-87893-463-4 (cloth). , 1997 .

[32]  J. McInerney,et al.  The causes of protein evolutionary rate variation. , 2006, Trends in ecology & evolution.

[33]  E. Koonin,et al.  Selection in the evolution of gene duplications , 2002, Genome Biology.

[34]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[35]  D. Labie,et al.  Molecular Evolution , 1991, Nature.

[36]  Tadashi Imanishi,et al.  A genome-wide survey of changes in protein evolutionary rates across four closely related species of Saccharomyces sensu stricto group , 2007, BMC Evolutionary Biology.

[37]  C. Wilke,et al.  A single determinant dominates the rate of yeast protein evolution. , 2006, Molecular biology and evolution.

[38]  C. Wilke,et al.  Why highly expressed proteins evolve slowly. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[39]  A. Wagner,et al.  Asymmetric sequence divergence of duplicate genes. , 2003, Genome research.

[40]  Mark Gerstein,et al.  Integrated prediction of the helical membrane protein interactome in yeast. , 2006, Journal of molecular biology.

[41]  A. E. Hirsh,et al.  Functional genomic analysis of the rates of protein evolution. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Wei-Lun Hsu,et al.  Proportion of solvent-exposed amino acids in a protein and rate of protein evolution. , 2007, Molecular biology and evolution.

[43]  Joshua B Plotkin,et al.  Assessing the determinants of evolutionary rates in the presence of noise. , 2007, Molecular biology and evolution.

[44]  M. Lynch,et al.  The evolutionary fate and consequences of duplicate genes. , 2000, Science.

[45]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[46]  Z. Gu,et al.  Different evolutionary patterns between young duplicate genes in the human genome , 2003, Genome Biology.

[47]  Mark Gerstein,et al.  Information assessment on predicting protein-protein interactions , 2004, BMC Bioinformatics.

[48]  C. Pál,et al.  An integrated view of protein evolution , 2006, Nature Reviews Genetics.

[49]  Frances H Arnold,et al.  Structural determinants of the rate of protein evolution in yeast. , 2006, Molecular biology and evolution.

[50]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..