Statistical analysis of differential gene expression relative to a fold change threshold on NanoString data of mouse odorant receptor genes

BackgroundA challenge in gene expression studies is the reliable identification of differentially expressed genes. In many high-throughput studies, genes are accepted as differentially expressed only if they satisfy simultaneously a p value criterion and a fold change criterion. A statistical method, TREAT, has been developed for microarray data to assess formally if fold changes are significantly higher than a predefined threshold. We have recently applied the NanoString digital platform to study expression of mouse odorant receptor genes, which form with 1,200 members the largest gene family in the mouse genome. Our objectives are, on these data, to decrease false discoveries when formally assessing the genes relative to a fold change threshold, and to provide a guided selection in the choice of this threshold.ResultsStatistical tests have been developed for microarray data to identify genes that are differentially expressed relative to a fold change threshold. Here we report that another approach, which we refer to as tTREAT, is more appropriate for our NanoString data, where false discoveries lead to costly and time-consuming follow-up experiments. Methods that we refer to as tTREAT2 and the running fold change model improve the performance of the statistical tests by protecting or selecting the fold change threshold more objectively. We show the benefits on simulated and real data.ConclusionsGene-wise statistical analyses of gene expression data, for which the significance relative to a fold change threshold is important, give reproducible and reliable results on NanoString data of mouse odorant receptor genes. Because it can be difficult to set in advance a fold change threshold that is meaningful for the available data, we developed methods that enable a better choice (thus reducing false discoveries and/or missed genes) or avoid this choice altogether. This set of tools may be useful for the analysis of other types of gene expression data.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Peter Mombaerts,et al.  Mapping of Class I and Class II Odorant Receptors to Glomerular Domains by Two Distinct Types of Olfactory Sensory Neurons in the Mouse , 2009, Neuron.

[3]  Ingrid Lönnstedt Replicated microarray data , 2001 .

[4]  J. Davis Bioinformatics and Computational Biology Solutions Using R and Bioconductor , 2007 .

[5]  Nader Pourmand,et al.  NanoStriDE: normalization and differential expression analysis of NanoString nCounter data , 2011, BMC Bioinformatics.

[6]  Marco Marra,et al.  Transcriptome analysis of the normal human mammary cell commitment and differentiation process. , 2008, Cell stem cell.

[7]  Zhong-Hui Duan,et al.  Fold change and p-value cutoffs significantly alter microarray interpretations , 2012, BMC Bioinformatics.

[8]  P. Mombaerts,et al.  Temporal patterns of odorant receptor gene expression in adult and aged mice , 2013, Molecular and Cellular Neuroscience.

[9]  David Hernández,et al.  New approaches for functional genomic studies in staphylococci. , 2010, International journal of medical microbiology : IJMM.

[10]  R. Machiraju,et al.  Canonical and Atypical E2Fs Regulate the Mammalian Endocycle , 2012, Nature Cell Biology.

[11]  Yee Hwa Yang,et al.  CHAPTER 2 Design and Analysis of Comparative Microarray Experiments , 2002 .

[12]  C. Pipper,et al.  [''R"--project for statistical computing]. , 2008, Ugeskrift for laeger.

[13]  Gordon K. Smyth,et al.  Testing significance relative to a fold-change threshold is a TREAT , 2009, Bioinform..

[14]  Andreas Rytz,et al.  The limit fold change model: A practical approach for selecting differentially expressed genes from microarray data , 2002, BMC Bioinformatics.

[15]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[16]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[17]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[18]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[19]  Evelien Vaes,et al.  Regulation of the Probability of Mouse Odorant Receptor Gene Choice , 2011, Cell.

[20]  P. Mombaerts,et al.  Characterization of a cluster comprising ∼100 odorant receptor genes in mouse , 2000, Mammalian Genome.

[21]  P. Mombaerts,et al.  Local and cis Effects of the H Element on Expression of Odorant Receptor Genes in Mouse , 2007, Cell.

[22]  Paul C Boutros,et al.  Systematic evaluation of medium-throughput mRNA abundance platforms. , 2013, RNA.

[23]  Jennifer L. Osborn,et al.  Direct multiplexed measurement of gene expression with color-coded probe pairs , 2008, Nature Biotechnology.

[24]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Gordon K Smyth,et al.  Identification and functional significance of genes regulated by structurally different histone deacetylase inhibitors. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[26]  P. Mombaerts,et al.  Characterization of a cluster comprising approximately 100 odorant receptor genes in mouse. , 2000, Mammalian genome : official journal of the International Mammalian Genome Society.

[27]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[28]  Rafael A. Irizarry,et al.  Bioinformatics and Computational Biology Solutions using R and Bioconductor , 2005 .

[29]  D. Edwards,et al.  Statistical Analysis of Gene Expression Microarray Data , 2003 .

[30]  Yoshihide Hayashizaki,et al.  Spatial patterns of gene expression in the olfactory bulb. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[31]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[32]  Russell D. Wolfinger,et al.  The contributions of sex, genotype and age to transcriptional variance in Drosophila melanogaster , 2001, Nature Genetics.

[33]  Pierre R. Bushel,et al.  Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..

[34]  A. Bradley,et al.  Chromosome engineering in mice , 1995, Nature.

[35]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[36]  Gordon K. Smyth,et al.  limma: Linear Models for Microarray Data , 2005 .

[37]  P. Collins,et al.  Performance comparison of one-color and two-color platforms within the Microarray Quality Control (MAQC) project , 2006, Nature Biotechnology.

[38]  Hitoshi Sakano,et al.  One neuron-one receptor rule in the mouse olfactory system. , 2004, Trends in genetics : TIG.

[39]  BMC Bioinformatics , 2005 .

[40]  Paul C. Boutros,et al.  NanoStringNorm: an extensible R package for the pre-processing of NanoString mRNA and miRNA data , 2012, Bioinform..