Tissue-Specific Evolution of Protein Coding Genes in Human and Mouse

Protein-coding genes evolve at different rates, and the influence of different parameters, from gene size to expression level, has been extensively studied. While in yeast gene expression level is the major causal factor of gene evolutionary rate, the situation is more complex in animals. Here we investigate these relations further, especially taking in account gene expression in different organs as well as indirect correlations between parameters. We used RNA-seq data from two large datasets, covering 22 mouse tissues and 27 human tissues. Over all tissues, evolutionary rate only correlates weakly with levels and breadth of expression. The strongest explanatory factors of purifying selection are GC content, expression in many developmental stages, and expression in brain tissues. While the main component of evolutionary rate is purifying selection, we also find tissue-specific patterns for sites under neutral evolution and for positive selection. We observe fast evolution of genes expressed in testis, but also in other tissues, notably liver, which are explained by weak purifying selection rather than by positive selection.

[1]  J. Harrow,et al.  Multiple evidence strands suggest that there may be as few as 19 000 human protein-coding genes , 2014, Human molecular genetics.

[2]  David J. Galas,et al.  RCytoscape: tools for exploratory network analysis , 2013, BMC Bioinformatics.

[3]  Alfonso Valencia,et al.  APPRIS: annotation of principal and alternative splice isoforms , 2012, Nucleic Acids Res..

[4]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[5]  R. Nielsen,et al.  Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. , 2005, Molecular biology and evolution.

[6]  Arnold Kuzniar,et al.  Selectome update: quality control and computational improvements to a database of positive selection , 2013, Nucleic Acids Res..

[7]  J. Nielsen,et al.  Analysis of the Human Tissue-specific Expression by Genome-wide Integration of Transcriptomics and Antibody-based Proteomics* , 2013, Molecular & Cellular Proteomics.

[8]  X. Gu,et al.  Tissue-driven hypothesis of genomic evolution and sequence-expression correlations , 2007, Proceedings of the National Academy of Sciences.

[9]  Kevin S. Smith,et al.  High-Resolution Transcriptome Analysis with Long-Read RNA Sequencing , 2014, PloS one.

[10]  Maryam Zaheri,et al.  A Generalized Mechanistic Codon Model , 2014, Molecular biology and evolution.

[11]  Gabor Grothendieck,et al.  Lattice: Multivariate Data Visualization with R , 2008 .

[12]  Sébastien Moretti,et al.  Bgee: Integrating and Comparing Heterogeneous Transcriptome Data Among Species , 2008, DILS.

[13]  M. Albà,et al.  On homology searches by protein Blast and the characterization of the age of genes , 2007, BMC Evolutionary Biology.

[14]  S. Pääbo,et al.  Intra- and Interspecific Variation in Primate Gene Expression Patterns , 2002, Science.

[15]  Ben-Yang Liao,et al.  Impacts of gene essentiality, expression pattern, and gene compactness on the evolutionary rate of mammalian proteins. , 2006, Molecular biology and evolution.

[16]  Michael Lachmann,et al.  Evolution of primate gene expression , 2006, Nature Reviews Genetics.

[17]  Alex Wong,et al.  Evolution of protein-coding genes in Drosophila. , 2008, Trends in genetics : TIG.

[18]  Christina Backes,et al.  An integer linear programming approach for finding deregulated subgraphs in regulatory networks , 2011, Nucleic acids research.

[19]  David P. Kreil,et al.  The concordance between RNA-seq and microarray data depends on chemical treatment and transcript abundance , 2014, Nature Biotechnology.

[20]  B. Gaut,et al.  Factors that contribute to variation in evolutionary rate among Arabidopsis genes. , 2011, Molecular biology and evolution.

[21]  G. Coop,et al.  No effect of recombination on the efficacy of natural selection in primates. , 2008, Genome research.

[22]  D. Petrov,et al.  Preferential Duplication of Conserved Proteins in Eukaryotic Genomes , 2004, PLoS biology.

[23]  Judith A Blake,et al.  Mouse Genome Database , 2000, Mammalian Genome.

[24]  Sudhir Kumar,et al.  Gene Expression Intensity Shapes Evolutionary Rates of the Proteins Encoded by the Vertebrate Genome , 2004, Genetics.

[25]  Josephine T. Daub,et al.  Patterns of Positive Selection in Seven Ant Genomes , 2013, Molecular biology and evolution.

[26]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[27]  Sun Shim Choi,et al.  Expression breadth and expression abundance behave differently in correlations with evolutionary rates , 2010, BMC Evolutionary Biology.

[28]  Claus O. Wilke,et al.  Mistranslation-Induced Protein Misfolding as a Dominant Constraint on Coding-Sequence Evolution , 2008, Cell.

[29]  Luisa Canal,et al.  A normal approximation for the chi-square distribution , 2005, Comput. Stat. Data Anal..

[30]  J. Harrow,et al.  Transcriptome analysis of human tissues and cell lines reveals one dominant transcript per gene , 2013, Genome Biology.

[31]  Peter Tompa,et al.  Synonymous Constraint Elements Show a Tendency to Encode Intrinsically Disordered Protein Segments , 2014, PLoS Comput. Biol..

[32]  Laurent Gil,et al.  Ensembl 2013 , 2012, Nucleic Acids Res..

[33]  A. E. Hirsh,et al.  Functional genomic analysis of the rates of protein evolution. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[34]  G. Wray,et al.  Contrasts between adaptive coding and noncoding changes during human evolution , 2010, Proceedings of the National Academy of Sciences.

[35]  Judith A. Blake,et al.  The Mouse Genome Database: integration of and access to knowledge about the laboratory mouse , 2013, Nucleic Acids Res..

[36]  C. Pál,et al.  An integrated view of protein evolution , 2006, Nature Reviews Genetics.

[37]  Eduardo P C Rocha,et al.  An analysis of determinants of amino acids substitution rates in bacterial proteins. , 2004, Molecular biology and evolution.

[38]  Jianzhi Zhang,et al.  Low rates of expression profile divergence in highly expressed genes and tissue-specific genes during mammalian evolution. , 2006, Molecular biology and evolution.

[39]  M. Stephens,et al.  Sex-specific and lineage-specific alternative splicing in primates. , 2010, Genome research.

[40]  O. Jaillon,et al.  Gene loss and evolutionary rates following whole-genome duplication in teleost fishes. , 2006, Molecular biology and evolution.

[41]  Raymond K. Auerbach,et al.  A User's Guide to the Encyclopedia of DNA Elements (ENCODE) , 2011, PLoS biology.

[42]  Araxi O. Urrutia,et al.  The signature of selection mediated by expression on human genes. , 2003, Genome research.

[43]  C. Wilke,et al.  Why highly expressed proteins evolve slowly. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[44]  S. Amladi,et al.  Online Mendelian Inheritance in Man 'OMIM'. , 2003, Indian journal of dermatology, venereology and leprology.

[45]  B. Negre,et al.  Evolution of the achaete-scute complex in insects: convergent duplication of proneural genes. , 2009, Trends in genetics : TIG.

[46]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[47]  D. Graur,et al.  The "inverse relationship between evolutionary rate and age of mammalian genes" is an artifact of increased genetic distance with rate of evolution and time of divergence. , 2006, Molecular biology and evolution.

[48]  M. Anisimova,et al.  Unraveling Patterns of Site-to-Site Synonymous Rates Variation and Associated Gene Properties of Protein Domains and Families , 2014, PloS one.

[49]  L. Duret,et al.  GC-biased gene conversion promotes the fixation of deleterious amino acid changes in primates. , 2009, Trends in genetics : TIG.

[50]  Hadley Wickham,et al.  The Split-Apply-Combine Strategy for Data Analysis , 2011 .

[51]  C. Pál,et al.  Highly expressed genes in yeast evolve slowly. , 2001, Genetics.

[52]  Julien Roux,et al.  Age-dependent gain of alternative splice forms and biased duplication explain the relation between splicing and duplication. , 2011, Genome research.

[53]  Sergei L. Kosakovsky Pond,et al.  Detecting Individual Sites Subject to Episodic Diversifying Selection , 2012, PLoS genetics.

[54]  H. Kaessmann,et al.  Evolution of the Correlation between Expression Divergence and Protein Divergence in Mammals , 2013, Genome biology and evolution.

[55]  E. Ruppin,et al.  Evolutionary rate and gene expression across different brain regions , 2008, Genome Biology.

[56]  D. Niu,et al.  Selection for the miniaturization of highly expressed genes. , 2007, Biochemical and biophysical research communications.

[57]  J. Montoya-Burgos,et al.  Recombination explains isochores in mammalian genomes. , 2003, Trends in genetics : TIG.

[58]  Michael B. Black,et al.  IVT-seq reveals extreme bias in RNA sequencing , 2014, Genome Biology.

[59]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[60]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[61]  L. Hurst,et al.  The Small Introns of Antisense Genes Are Better Explained by Selection for Rapid Transcription Than by “Genomic Design” , 2005, Genetics.

[62]  Jianzhi Zhang,et al.  Phylostratigraphic Bias Creates Spurious Patterns of Genome Evolution. , 2016, Molecular biology and evolution.

[63]  J. T. Erichsen,et al.  Enhancer Evolution across 20 Mammalian Species , 2015, Cell.

[64]  Gary A. Churchill,et al.  A New Standard Genetic Map for the Laboratory Mouse , 2009, Genetics.

[65]  K. Kuma,et al.  Functional constraints against variations on molecules from the tissue level: slowly evolving brain-specific genes demonstrated by protein kinase and immunoglobulin supergene families. , 1995, Molecular biology and evolution.

[66]  Y. Benjamini,et al.  Summarizing and correcting the GC content bias in high-throughput sequencing , 2012, Nucleic acids research.

[67]  Eduardo P C Rocha,et al.  The quest for the universals of protein evolution. , 2006, Trends in genetics : TIG.

[68]  Doron Lancet,et al.  Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification , 2005, Bioinform..

[69]  Maria Keays,et al.  ArrayExpress update—trends in database growth and links to data analysis tools , 2012, Nucleic Acids Res..

[70]  L. Duret,et al.  Statistical analysis of vertebrate sequences reveals that long genes are scarce in GC-rich isochores , 1995, Journal of Molecular Evolution.

[71]  Philipp W. Messer,et al.  Quantification of GC-biased gene conversion in the human genome , 2014, bioRxiv.

[72]  K. Hokamp,et al.  The complex relationship of gene duplication and essentiality. , 2009, Trends in genetics : TIG.

[73]  Sandrine Dudoit,et al.  GC-Content Normalization for RNA-Seq Data , 2011, BMC Bioinformatics.

[74]  Adi Doron-Faigenboim,et al.  Evolutionary models accounting for layers of selection in protein-coding genes and their impact on the inference of positive selection. , 2011, Molecular biology and evolution.

[75]  Terence P. Speed,et al.  Expression profiling in primates reveals a rapid evolution of human transcription factors , 2006, Nature.

[76]  A. McLysaght,et al.  Evolution of Vertebrate Tissues Driven by Differential Modes of Gene Duplication , 2012, DNA research : an international journal for rapid publication of reports on genes and genomes.

[77]  Peer Bork,et al.  OGEE: an online gene essentiality database , 2011, Nucleic Acids Res..

[78]  L. Duret,et al.  Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. , 2000, Molecular biology and evolution.

[79]  Henrik Kaessmann,et al.  Evolutionary dynamics of coding and non-coding transcriptomes , 2014, Nature Reviews Genetics.

[80]  S. Bergmann,et al.  The evolution of gene expression levels in mammalian organs , 2011, Nature.

[81]  Eugene V Koonin,et al.  Duplicated genes evolve slower than singletons despite the initial rate increase , 2004, BMC Evolutionary Biology.

[82]  M. Albà,et al.  Inverse relationship between evolutionary rate and age of mammalian genes. , 2005, Molecular biology and evolution.