Fully Bayesian Analysis of RNA-seq Counts for the Detection of Gene Expression Heterosis

ABSTRACT Heterosis, or hybrid vigor, is the enhancement of the phenotype of hybrid progeny relative to their inbred parents. Heterosis is extensively used in agriculture, and the underlying mechanisms are unclear. To investigate the molecular basis of phenotypic heterosis, researchers search tens of thousands of genes for heterosis with respect to expression in the transcriptome. Difficulty arises in the assessment of heterosis due to composite null hypotheses and nonuniform distributions for p-values under these null hypotheses. Thus, we develop a general hierarchical model for count data and a fully Bayesian analysis in which an efficient parallelized Markov chain Monte Carlo algorithm ameliorates the computational burden. We use our method to detect gene expression heterosis in a two-hybrid plant-breeding scenario, both in a real RNA-seq maize dataset and in simulation studies. In the simulation studies, we show our method has well-calibrated posterior probabilities and credible intervals when the model assumed in analysis matches the model used to simulate the data. Although model misspecification can adversely affect calibration, the methodology is still able to accurately rank genes. Finally, we show that hyperparameter posteriors are extremely narrow and an empirical Bayes (eBayes) approach based on posterior means from the fully Bayesian analysis provides virtually equivalent posterior probabilities, credible intervals, and gene rankings relative to the fully Bayesian solution. This evidence of equivalence provides support for the use of eBayes procedures in RNA-seq data analysis if accurate hyperparameter estimates can be obtained. Supplementary materials for this article are available online.

[1]  P. Liu,et al.  A Semi-parametric Bayesian Approach for Differential Expression Analysis of RNA-seq Data , 2015, Journal of Agricultural, Biological, and Environmental Statistics.

[2]  Thomas Thorne,et al.  Approximate inference of gene regulatory network models from RNA-Seq time series data , 2017, BMC Bioinformatics.

[3]  Jiqiang Guo,et al.  Stan: A Probabilistic Programming Language. , 2017, Journal of statistical software.

[4]  Jarad Niemi,et al.  A fully Bayesian strategy for high-dimensional hierarchical modeling using massively parallel computing , 2016, 1606.06659.

[5]  Will Landau fbseqOpenMP: Release for version 0.0 , 2016 .

[6]  Jarad Niemi,et al.  fbseq: Release for version 0.0 , 2016 .

[7]  D. Nettleton,et al.  Empirical Bayes Analysis of RNA-seq Data for Detection of Gene Expression Heterosis , 2015, Journal of agricultural, biological, and environmental statistics.

[8]  D. Nettleton,et al.  Hierarchical Modeling and Differential Expression Analysis for RNA-seq Experiments with Inbred and Hybrid Genotypes , 2015, Journal of agricultural, biological, and environmental statistics.

[9]  R. Bodík,et al.  Programming With Models: Writing Statistical Algorithms for General Model Structures With NIMBLE , 2015, 1505.05093.

[10]  Estimation and Testing of Gene Expression Heterosis , 2014, Journal of agricultural, biological, and environmental statistics.

[11]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[12]  Peng Liu,et al.  Dispersion Estimation and Its Effect on Test Performance in RNA-seq Data Analysis: A Simulation-Based Comparison of Methods , 2013, PloS one.

[13]  T. Dickhaus Randomized p-values for multiple testing of composite null hypotheses , 2013 .

[14]  M. Wand,et al.  Variational inference for count response semiparametric regression , 2013, 1309.4199.

[15]  Peng Liu,et al.  An Optimal Test with Maximum Average Power While Controlling FDR with Application to RNA‐Seq Data , 2013, Biometrics.

[16]  Ning Leng,et al.  EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments , 2013, Bioinform..

[17]  Hao Wu,et al.  A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data , 2012, Biostatistics.

[18]  David B. Dunson,et al.  Bayesian data analysis, third edition , 2013 .

[19]  Riten Mitra,et al.  Bayesian Nonparametric Inference - Why and How. , 2013, Bayesian analysis.

[20]  Steven P. Lund,et al.  Complementation contributes to transcriptome complexity in maize (Zea mays L.) hybrids relative to their inbred parents , 2012, Genome research.

[21]  Alexander C. McLain,et al.  Multiple Testing of Composite Null Hypotheses in Heteroscedastic Models , 2012 .

[22]  Davis J. McCarthy,et al.  Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation , 2012, Nucleic acids research.

[23]  Matthew D. Young,et al.  From RNA-seq reads to differential expression results , 2010, Genome Biology.

[24]  Thomas J. Hardcastle,et al.  baySeq: Empirical Bayesian methods for identifying differential expression in sequence count data , 2010, BMC Bioinformatics.

[25]  Dani Zamir,et al.  The flowering gene SINGLE FLOWER TRUSS drives heterosis for yield in tomato , 2010, Nature Genetics.

[26]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[27]  S. Cabras A note on multiple testing for composite null hypotheses , 2010 .

[28]  T. Brutnell,et al.  Exploring plant transcriptomes using ultra high-throughput sequencing. , 2010, Briefings in functional genomics.

[29]  Zhiyi Chi Multiple hypothesis testing on composite nulls using constrained p-values , 2010 .

[30]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[31]  Dawn H. Nagel,et al.  The B73 Maize Genome: Complexity, Diversity, and Dynamics , 2009, Science.

[32]  Andrew Thomas,et al.  The BUGS project: Evolution, critique and future directions , 2009, Statistics in medicine.

[33]  James G. Scott,et al.  Handling Sparsity via the Horseshoe , 2009, AISTATS.

[34]  S. Dudoit,et al.  Multiple Testing Procedures with Applications to Genomics , 2007 .

[35]  Nathan M. Springer,et al.  Allelic variation and heterosis in maize: how do two halves make more than a whole? , 2007, Genome research.

[36]  Z. Lippman,et al.  Heterosis: revisiting the magic. , 2007, Trends in genetics : TIG.

[37]  Dan Nettleton,et al.  All possible modes of gene action are observed in a global comparison of gene expression in a maize F1 hybrid and its inbred parents. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[38]  R W Doerge,et al.  Genomewide Nonadditive Gene Regulation in Arabidopsis Allotetraploids , 2006, Genetics.

[39]  N. Meinshausen,et al.  Estimating the proportion of false null hypotheses among a large number of independently tested hypotheses , 2005, math/0501289.

[40]  Joseph P. Romano,et al.  Stepup procedures for control of generalizations of the familywise error rate , 2006, math/0611266.

[41]  H. Piepho,et al.  Manifestation of heterosis during early maize (Zea mays L.) root development , 2006, Theoretical and Applied Genetics.

[42]  B. Ripley,et al.  Semiparametric Regression: Preface , 2003 .

[43]  John D. Storey The positive false discovery rate: a Bayesian interpretation and the q-value , 2003 .

[44]  S. Walker Invited comment on the paper "Slice Sampling" by Radford Neal , 2003 .

[45]  Radford M. Neal Slice Sampling , 2000, physics/0009028.

[46]  Martyn Plummer,et al.  JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling , 2003 .

[47]  E. Brummer,et al.  Heterosis of Agronomic Traits in Alfalfa , 2002 .

[48]  I. Baldwin,et al.  Molecular interactions between the specialist herbivore Manduca sexta (Lepidoptera, Sphingidae) and its natural host Nicotiana attenuata. IV. Insect-Induced ethylene reduces jasmonate-induced nicotine accumulation by regulating putrescine N-methyltransferase transcripts. , 2001, Plant physiology.

[49]  M. J. Bayarri,et al.  P Values for Composite Null Models , 2000 .

[50]  James M. Robins,et al.  Asymptotic Distribution of P Values in Composite Null Models , 2000 .

[51]  Andrew Thomas,et al.  WinBUGS - A Bayesian modelling framework: Concepts, structure, and extensibility , 2000, Stat. Comput..

[52]  T P Speed,et al.  A score test for the linkage analysis of qualitative and quantitative traits based on identity by descent data from sib-pairs. , 2000, Biostatistics.

[53]  James G. Coors,et al.  Genetics and Exploitation of Heterosis in Crops , 1999 .

[54]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[55]  Byung Han Choi,et al.  The Genetics and Exploitation of Heterosis in Crops , 1997 .

[56]  Cai-guo Xu,et al.  Importance of epistasis as the genetic basis of heterosis in an elite rice hybrid. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[57]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[58]  G. Wohlfarth Heterosis for growth rate in common carp , 1993 .

[59]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[60]  T. Ferguson,et al.  Bayesian nonparametric inference , 1992 .

[61]  A. Hallauer,et al.  Quantitative Genetics in Maize Breeding , 1981 .

[62]  W. T. THISELTON DYER,et al.  The Effects of Cross- and Self-Fertilisation in the Vegetable Kingdom , 1877, Nature.

[63]  Charles Darwin,et al.  The Effects of Cross and Self Fertilisation in the Vegetable Kingdom , 1972 .