Tissue-Specificity of Gene Expression Diverges Slowly between Orthologs, and Rapidly between Paralogs

The ortholog conjecture implies that functional similarity between orthologous genes is higher than between paralogs. It has been supported using levels of expression and Gene Ontology term analysis, although the evidence was rather weak and there were also conflicting reports. In this study on 12 species we provide strong evidence of high conservation in tissue-specificity between orthologs, in contrast to low conservation between within-species paralogs. This allows us to shed a new light on the evolution of gene expression patterns. While there have been several studies of the correlation of expression between species, little is known about the evolution of tissue-specificity itself. Ortholog tissue-specificity is strongly conserved between all tetrapod species, with the lowest Pearson correlation between mouse and frog at r = 0.66. Tissue-specificity correlation decreases strongly with divergence time. Paralogs in human show much lower conservation, even for recent Primate-specific paralogs. When both paralogs from ancient whole genome duplication tissue-specific paralogs are tissue-specific, it is often to different tissues, while other tissue-specific paralogs are mostly specific to the same tissue. The same patterns are observed using human or mouse as focal species, and are robust to choices of datasets and of thresholds. Our results support the following model of evolution: in the absence of duplication, tissue-specificity evolves slowly, and tissue-specific genes do not change their main tissue of expression; after small-scale duplication the less expressed paralog loses the ancestral specificity, leading to an immediate difference between paralogs; over time, both paralogs become more broadly expressed, but remain poorly correlated. Finally, there is a small number of paralog pairs which stay tissue-specific with the same main tissue of expression, for at least 300 million years.

[1]  James Cheshire,et al.  Lattice: Multivariate Data Visualization with R , 2010 .

[2]  Frank Grützner,et al.  Mechanisms and Evolutionary Patterns of Mammalian and Avian Dosage Compensation , 2012, PLoS biology.

[3]  M. Tress,et al.  Alternative Splicing May Not Be the Key to Proteome Complexity. , 2017, Trends in biochemical sciences.

[4]  Jianzhi Zhang,et al.  Evolutionary conservation of expression profiles between human and mouse orthologous genes. , 2006, Molecular biology and evolution.

[5]  Matthew D. Rasmussen,et al.  Phylogenetic Identification and Functional Characterization of Orthologs and Paralogs across Human, Mouse, Fly, and Worm , 2014, bioRxiv.

[6]  J. Nielsen,et al.  Analysis of the Human Tissue-specific Expression by Genome-wide Integration of Transcriptomics and Antibody-based Proteomics* , 2013, Molecular & Cellular Proteomics.

[7]  E. Koonin,et al.  Functional and evolutionary implications of gene orthology , 2013, Nature Reviews Genetics.

[8]  M. Robinson‐Rechavi,et al.  Large-Scale Analysis of Orthologs and Paralogs under Covarion-Like and Constant-but-Different Models of Amino Acid Evolution , 2010, Molecular biology and evolution.

[9]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[10]  Christophe Dessimoz,et al.  Resolving the Ortholog Conjecture: Orthologs Tend to Be Weakly, but Significantly, More Similar in Function than Paralogs , 2012, PLoS Comput. Biol..

[11]  Scott A. Rifkin,et al.  Duplicate genes increase gene expression diversity within and between species , 2004, Nature Genetics.

[12]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Marc Robinson-Rechavi,et al.  Tissue-Specific Evolution of Protein Coding Genes in Human and Mouse , 2014, bioRxiv.

[14]  Vicent Pelechano,et al.  Genome-wide identification of transcript start and end sites by transcript isoform sequencing , 2014, Nature Protocols.

[15]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[16]  L. Duret,et al.  Pervasive positive selection on duplicated and nonduplicated vertebrate protein coding genes. , 2008, Genome research.

[17]  Predrag Radivojac,et al.  Testing the Ortholog Conjecture with Comparative Functional Genomic Data from Mammals , 2011, PLoS Comput. Biol..

[18]  David Haussler,et al.  Current status and new features of the Consensus Coding Sequence database , 2013, Nucleic Acids Res..

[19]  Steven E Brenner,et al.  Comparison of D. melanogaster and C. elegans developmental stages, tissues, and cells by modENCODE RNA-seq data , 2014, Genome research.

[20]  Marc Robinson-Rechavi,et al.  A benchmark of gene expression tissue-specificity metrics , 2015, bioRxiv.

[21]  Albert J. Vilella,et al.  EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. , 2009, Genome research.

[22]  Dmitri A Petrov,et al.  Do disparate mechanisms of duplication add similar genes to the genome? , 2005, Trends in genetics : TIG.

[23]  K. H. Wolfe,et al.  Divergence of spatial gene expression profiles following species-specific gene duplications in human and mouse. , 2004, Genome Research.

[24]  Sven Bergmann,et al.  Correcting for the bias due to expression specificity improves the estimation of constrained evolution of expression between mouse and human , 2012, Bioinform..

[25]  Hilbert J. Kappen,et al.  The Cluster Variation Method for Efficient Linkage Analysis on Extended Pedigrees , 2006, BMC Bioinformatics.

[26]  Hadley Wickham,et al.  The Split-Apply-Combine Strategy for Data Analysis , 2011 .

[27]  M. Lynch,et al.  Maintenance and Loss of Duplicated Genes by Dosage Subfunctionalization. , 2015, Molecular biology and evolution.

[28]  T. Lumley,et al.  gplots: Various R Programming Tools for Plotting Data , 2015 .

[29]  Thomas M. Keane,et al.  Mouse genomic variation and its effect on phenotypes and gene regulation , 2011, Nature.

[30]  Laurent Gil,et al.  Ensembl 2013 , 2012, Nucleic Acids Res..

[31]  I. Yanai,et al.  Incongruent expression profiles between human and mouse orthologous genes suggest widespread neutral evolution of transcription control. , 2004, Omics : a journal of integrative biology.

[32]  Alexandra Igorevna Klimenko,et al.  Modeling evolution of spatially distributed bacterial communities: a simulation with the haploid evolutionary constructor , 2015, BMC Evolutionary Biology.

[33]  C. Burge,et al.  Evolutionary Dynamics of Gene and Isoform Regulation in Mammalian Tissues , 2012, Science.

[34]  S. Bergmann,et al.  The evolution of gene expression levels in mammalian organs , 2011, Nature.

[35]  Ravinder Singh,et al.  Fast-Find: A novel computational approach to analyzing combinatorial motifs , 2006, BMC Bioinformatics.

[36]  Svetlana A. Shabalina,et al.  Gene Family Level Comparative Analysis of Gene Expression in Mammals Validates the Ortholog Conjecture , 2014, Genome biology and evolution.

[37]  Xiaoshu Chen,et al.  The Ortholog Conjecture Is Untestable by the Current Gene Ontology but Is Supported by RNA Sequencing Data , 2012, PLoS Comput. Biol..

[38]  Doron Lancet,et al.  Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification , 2005, Bioinform..

[39]  Joel Dudley,et al.  TimeTree: a public knowledge-base of divergence times among organisms , 2006, Bioinform..

[40]  Sébastien Moretti,et al.  Bgee: Integrating and Comparing Heterogeneous Transcriptome Data Among Species , 2008, DILS.

[41]  Anton Nekrutenko,et al.  Rapid and asymmetric divergence of duplicate genes in the human gene coexpression network , 2006, BMC Bioinformatics.

[42]  Jonathan K. Pritchard,et al.  Coregulation of tandem duplicate genes slows evolution of subfunctionalization in mammals , 2015, Science.

[43]  M. Robinson‐Rechavi,et al.  How confident can we be that orthologs are similar, but paralogs differ? , 2009, Trends in genetics : TIG.

[44]  Henrik Kaessmann,et al.  Evolutionary dynamics of coding and non-coding transcriptomes , 2014, Nature Reviews Genetics.

[45]  Joaquín Dopazo,et al.  Evidence for short-time divergence and long-time conservation of tissue-specific expression after gene duplication , 2011, Briefings Bioinform..

[46]  Raymond K. Auerbach,et al.  A User's Guide to the Encyclopedia of DNA Elements (ENCODE) , 2011, PLoS biology.

[47]  Alfonso Valencia,et al.  Most highly expressed protein-coding genes have a single dominant isoform. , 2015, Journal of proteome research.

[48]  Johan Wagemans,et al.  A New Perceptual Bias Reveals Suboptimal Population Decoding of Sensory Responses , 2012, PLoS Comput. Biol..

[49]  Gabor Grothendieck,et al.  Lattice: Multivariate Data Visualization with R , 2008 .

[50]  Raquel Assis,et al.  Rapid divergence and diversification of mammalian duplicate gene functions , 2015, BMC Evolutionary Biology.

[51]  Judith A. Blake,et al.  On the Use of Gene Ontology Annotations to Assess Functional Similarity among Orthologs and Paralogs: A Short Report , 2012, PLoS Comput. Biol..

[52]  Yoav Gilad,et al.  A reanalysis of mouse ENCODE comparative gene expression data , 2015, F1000Research.