Detecting Multivariate Gene Interactions in RNA-Seq Data Using Optimal Bayesian Classification

Differential gene expression testing is an analysis commonly applied to RNA-Seq data. These statistical tests identify genes that are significantly different across phenotypes. We extend this testing paradigm to multivariate gene interactions from a classification perspective with the goal to detect novel gene interactions for the phenotypes of interest. This is achieved through our novel computational framework comprised of a hierarchical statistical model of the RNA-Seq processing pipeline and the corresponding optimal Bayesian classifier. Through Markov Chain Monte Carlo sampling and Monte Carlo integration, we compute quantities where no analytical formulation exists. The performance is then illustrated on an expression dataset from a dietary intervention study where we identify gene pairs that have low classification error yet were not identified as differentially expressed. Additionally, we have released the software package to perform OBC classification on RNA-Seq data under an open source license and is available at http://bit.ly/obc_package.

[1]  A. Farcomeni,et al.  Valproic Acid Induces Neuroendocrine Differentiation and UGT2B7 Up-Regulation in Human Prostate Carcinoma Cell Line , 2007, Drug Metabolism and Disposition.

[2]  Edward R. Dougherty,et al.  Modeling the next generation sequencing sample processing pipeline for the purposes of classification , 2013, BMC Bioinformatics.

[3]  J. Lupton,et al.  Interactive effects of fatty acid and butyrate‐induced mitochondrial Ca2+ loading and apoptosis in colonocytes , 2011, Cancer.

[4]  Alan Edelman,et al.  Julia: A Fast Dynamic Language for Technical Computing , 2012, ArXiv.

[5]  Edward R. Dougherty,et al.  Optimal classifiers with minimum expected error within a Bayesian framework - Part I: Discrete and Gaussian models , 2013, Pattern Recognit..

[6]  Nicolas Servant,et al.  A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis , 2013, Briefings Bioinform..

[7]  Edward R. Dougherty,et al.  Noninvasive Detection of Candidate Molecular Biomarkers in Subjects with a History of Insulin Resistance and Colorectal Adenomas , 2009, Cancer Prevention Research.

[8]  David R. Kelley,et al.  Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks , 2012, Nature Protocols.

[9]  B. Carlin,et al.  Markov Chain Monte Carlo conver-gence diagnostics: a comparative review , 1996 .

[10]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[11]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[12]  A. A. Spector,et al.  Very long chain n-3 and n-6 polyunsaturated fatty acids bind strongly to liver fatty acid-binding protein. , 2002, Journal of lipid research.

[13]  A. Fusco,et al.  MiR-1 is a tumor suppressor in thyroid carcinogenesis targeting CCND2, CXCR4, and SDF-1alpha. , 2011, The Journal of clinical endocrinology and metabolism.

[14]  J. Lupton,et al.  Mechanisms by which docosahexaenoic acid and related fatty acids reduce colon cancer risk and inflammatory disorders of the intestine. , 2008, Chemistry and physics of lipids.

[15]  Raymond J Carroll,et al.  Chemopreventive n-3 Polyunsaturated Fatty Acids Reprogram Genetic Signatures during Colon Cancer Initiation and Progression in the Rat , 2004, Cancer Research.

[16]  S. Curran,et al.  Liver fatty acid binding protein expression in colorectal neoplasia , 2004, British Journal of Cancer.

[17]  R. Stierum,et al.  Proteome analysis reveals novel proteins associated with proliferation and differentiation of the colorectal cancer cell line Caco-2. , 2003, Biochimica et biophysica acta.

[18]  R. Carroll,et al.  A chemoprotective fish oil- and pectin-containing diet temporally alters gene expression profiles in exfoliated rat colonocytes throughout oncogenesis. , 2011, The Journal of nutrition.

[19]  Edward R. Dougherty,et al.  Optimal classifiers with minimum expected error within a Bayesian framework - Part II: Properties and performance analysis , 2013, Pattern Recognit..

[20]  R. Carroll,et al.  Advanced Markov Chain Monte Carlo Methods: Learning from Past Samples , 2010 .

[21]  Edward R. Dougherty,et al.  Bayesian Minimum Mean-Square Error Estimation for Classification Error—Part I: Definition and the Bayesian MMSE Error Estimator for Discrete Classification , 2011, IEEE Transactions on Signal Processing.

[22]  Paul Theodor Pyl,et al.  HTSeq—a Python framework to work with high-throughput sequencing data , 2014, bioRxiv.

[23]  E. Wang,et al.  Analysis and design of RNA sequencing experiments for identifying isoform regulation , 2010, Nature Methods.

[24]  D. Takai,et al.  Analysis for the combination expression of CK20, FABP1 and MUC2 is sensitive for the prediction of peritoneal recurrence in gastric cancer. , 2012, Japanese journal of clinical oncology.

[25]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[26]  Y. Bergman,et al.  Transcriptional Regulation of Endothelial Arginase 2 by Histone Deacetylase 2 , 2014, Arteriosclerosis, thrombosis, and vascular biology.

[27]  Edward R. Dougherty,et al.  MCMC implementation of the optimal Bayesian classifier for non-Gaussian models: model-based RNA-Seq classification , 2014, BMC Bioinformatics.

[28]  Edward R. Dougherty,et al.  Exact Sample Conditioned MSE Performance of the Bayesian MMSE Estimator for Classification Error—Part I: Representation , 2012, IEEE Transactions on Signal Processing.