Correcting for the bias due to expression specificity improves the estimation of constrained evolution of expression between mouse and human

Motivation: Comparative analyses of gene expression data from different species have become an important component of the study of molecular evolution. Thus methods are needed to estimate evolutionary distances between expression profiles, as well as a neutral reference to estimate selective pressure. Divergence between expression profiles of homologous genes is often calculated with Pearson's or Euclidean distance. Neutral divergence is usually inferred from randomized data. Despite being widely used, neither of these two steps has been well studied. Here, we analyze these methods formally and on real data, highlight their limitations and propose improvements. Results: It has been demonstrated that Pearson's distance, in contrast to Euclidean distance, leads to underestimation of the expression similarity between homologous genes with a conserved uniform pattern of expression. Here, we first extend this study to genes with conserved, but specific pattern of expression. Surprisingly, we find that both Pearson's and Euclidean distances used as a measure of expression similarity between genes depend on the expression specificity of those genes. We also show that the Euclidean distance depends strongly on data normalization. Next, we show that the randomization procedure that is widely used to estimate the rate of neutral evolution is biased when broadly expressed genes are abundant in the data. To overcome this problem, we propose a novel randomization procedure that is unbiased with respect to expression profiles present in the datasets. Applying our method to the mouse and human gene expression data suggests significant gene expression conservation between these species. Contact: marc.robinson-rechavi@unil.ch; sven.bergmann@unil.ch Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  S. Pääbo,et al.  Parallel Patterns of Evolution in the Genomes and Transcriptomes of Humans and Chimpanzees , 2005, Science.

[2]  Cole Trapnell,et al.  Computational methods for transcriptome annotation and quantification using RNA-seq , 2011, Nature Methods.

[3]  Sébastien Moretti,et al.  Bgee: Integrating and Comparing Heterogeneous Transcriptome Data Among Species , 2008, DILS.

[4]  Andrew M. Jenkinson,et al.  Ensembl 2009 , 2008, Nucleic Acids Res..

[5]  A. Su,et al.  Gene expression evolves faster in narrowly than in broadly expressed mammalian genes. , 2005, Molecular biology and evolution.

[6]  S. Dudoit,et al.  Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. , 2002, Nucleic acids research.

[7]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  H. Parkinson,et al.  Large scale comparison of global gene expression patterns in human and mouse , 2010, Genome Biology.

[9]  D. Tautz,et al.  A phylogenetically based transcriptome age index mirrors ontogenetic divergence patterns , 2010, Nature.

[10]  Eugene V Koonin,et al.  Evolutionary significance of gene expression divergence. , 2005, Gene.

[11]  Y. Xing,et al.  Assessing the conservation of mammalian gene expression using high-density exon arrays. , 2007, Molecular biology and evolution.

[12]  Esther T. Chan,et al.  Conservation of core gene expression in vertebrate tissues , 2009, Journal of biology.

[13]  Meng-Pin Weng,et al.  Contrasting genetic paths to morphological and physiological evolution , 2010, Proceedings of the National Academy of Sciences.

[14]  Jianzhi Zhang,et al.  Evolutionary conservation of expression profiles between human and mouse orthologous genes. , 2006, Molecular biology and evolution.

[15]  Doron Lancet,et al.  Genome-wide midrange transcription profiles reveal expression level relationships in human tissue specification , 2005, Bioinform..

[16]  Damian Smedley,et al.  BioMart – biological queries made easy , 2009, BMC Genomics.

[17]  Klaas Vandepoele,et al.  Comparative Network Analysis Reveals That Tissue Specificity and Gene Function Are Important Factors Influencing the Mode of Expression Evolution in Arabidopsis and Rice1[W] , 2011, Plant Physiology.

[18]  John Quackenbush Microarray data normalization and transformation , 2002, Nature Genetics.

[19]  Jianzhi Zhang,et al.  Low rates of expression profile divergence in highly expressed genes and tissue-specific genes during mammalian evolution. , 2006, Molecular biology and evolution.

[20]  Eric T. Wang,et al.  An Abundance of Ubiquitously Expressed Genes Revealed by Tissue Transcriptome Sequence Data , 2009, PLoS Comput. Biol..

[21]  Rafael A. Irizarry,et al.  A Model-Based Background Adjustment for Oligonucleotide Expression Arrays , 2004 .

[22]  David Waxman,et al.  A Problem With the Correlation Coefficient as a Measure of Gene Expression Divergence , 2009, Genetics.

[23]  I. Yanai,et al.  Incongruent expression profiles between human and mouse orthologous genes suggest widespread neutral evolution of transcription control. , 2004, Omics : a journal of integrative biology.

[24]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.