Screening for Differentially Expressed Genes: Are Multilevel Models Helpful?

Screening for changes in gene expression across biological conditions using high throughput technologies is now common in biology. In this paper we present a broad Bayesian multilevel framework for developing computationally fast shrinkage-based screening tools for this purpose. Our scheme makes it easy to adapt the choice of statistics to the goals of the analysis and to the genomic distributions of signal and noise. We empirically investigate the extent to which these shrinkage-based statistics improve performance, and the situations in which such improvements are larger. Our evaluation uses both extensive simulations and controlled biological experiments. The experimental data include a socalled spike-in experiment, in which the target biological signal is known, and a two-sample experiment, which illustrates the typical conditions in which the methods are applied. Our results emphasize two important practical concerns that are not receiving sufficient attention in applied work in this area. First, while shrinkage strategies based on multilevel models are able to improve selection performance, they require careful verification of the assumptions on the relationship between signal and noise. Incorrect specification of this relationship can negatively affect a selection procedure. Because this inter-gene relationship is generally identifiable in genomic experiments, we suggest a simple diagnostic plot to assist model checking. Secondly, no statistic performs optimally across two common categories of experimental goals: selecting genes with large changes, and selecting genes with reliably measured changes. Therefore, careful consideration of analysis goals is critical in the choice of the approach taken.

[1]  Giovanni Parmigiani,et al.  A Bayesian Model for Cross-Study Differential Gene Expression , 2009, Journal of the American Statistical Association.

[2]  G. M. Kaufman,et al.  Bayesian Analysis of the Independent Multi-Normal Process--Neither Mean Nor Precision Known , 2011 .

[3]  Charles Kooperberg,et al.  Evaluating test statistics to select interesting genes in microarray experiments. , 2002, Human molecular genetics.

[4]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[5]  M. Glickman,et al.  Statistical Methods for Profiling Providers of Medical Care: Issues and Applications , 1997 .

[6]  H. Robbins An Empirical Bayes Approach to Statistics , 1956 .

[7]  John D. Storey,et al.  Empirical Bayes Analysis of a Microarray Experiment , 2001 .

[8]  Christina Kendziorski,et al.  On Differential Variability of Expression Ratios: Improving Statistical Inference about Gene Expression Changes from Microarray Data , 2001, J. Comput. Biol..

[9]  C. Robert,et al.  Optimal Sample Size for Multiple Testing : the Case of Gene Expression Mi roarraysPeter , 2004 .

[10]  G. Parmigiani,et al.  A statistical framework for expression‐based molecular classification in cancer , 2002 .

[11]  John D. Storey A direct approach to false discovery rates , 2002 .

[12]  Pierre R. Bushel,et al.  Assessing Gene Significance from cDNA Microarray Expression Data via Mixed Models , 2001, J. Comput. Biol..

[13]  James P. Egan,et al.  Signal detection theory and ROC analysis , 1975 .

[14]  Vanessa M Kvam,et al.  A comparison of statistical methods for detecting differentially expressed genes from RNA-seq data. , 2012, American journal of botany.

[15]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Terence P. Speed,et al.  Unifying Gene Expression Measures from Multiple Platforms Using Factor Analysis , 2011, PloS one.

[17]  Robert Nadon,et al.  Comparison of small n statistical tests of differential expression applied to microarrays , 2009, BMC Bioinformatics.

[18]  Scott L. Zeger,et al.  The Analysis of Gene Expression Data: An Overview of Methods and Software , 2003 .

[19]  Scott L. Zeger,et al.  The Analysis of Gene Expression Data: Methods and Software , 2013 .

[20]  Terence P. Speed,et al.  On Gene Ranking Using Replicated Microarray Time Course Data , 2009, Biometrics.

[21]  John D. Storey,et al.  SAM Thresholding and False Discovery Rates for Detecting Differential Gene Expression in DNA Microarrays , 2003 .

[22]  Richard Simon,et al.  A random variance model for detection of differential gene expression in small microarray experiments , 2003, Bioinform..

[23]  C. Mason,et al.  Comprehensive evaluation of differential gene expression analysis methods for RNA-seq data , 2013, Genome Biology.

[24]  Wei Pan,et al.  A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments , 2002, Bioinform..

[25]  Sandrine Dudoit,et al.  Bioconductor R Packages for Exploratory Analysis and Normalization of cDNA Microarray Data , 2003 .

[26]  M. Schummer,et al.  Selecting Differentially Expressed Genes from Microarray Experiments , 2003, Biometrics.

[27]  Bradley P. Carlin,et al.  BAYES AND EMPIRICAL BAYES METHODS FOR DATA ANALYSIS , 1996, Stat. Comput..

[28]  John Aach,et al.  Measuring absolute expression with microarrays with a calibrated reference sample and an extended signal intensity range , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Sourabh Bhattacharya,et al.  A Bayesian approach to modeling dynamic effective connectivity with fMRI data , 2006, NeuroImage.

[30]  D. Lindley,et al.  Bayes Estimates for the Linear Model , 1972 .

[31]  S. Ramaswamy,et al.  Microarrays for an integrative genomics , 2004 .

[32]  Gary A. Churchill,et al.  Analysis of Variance for Gene Expression Microarray Data , 2000, J. Comput. Biol..

[33]  D. Edwards,et al.  Statistical Analysis of Gene Expression Microarray Data , 2003 .

[34]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[35]  Erin M. Conlon,et al.  A Bayesian mixture model for metaanalysis of microarray studies , 2008, Functional & Integrative Genomics.