An empirical Bayes approach to network recovery using external knowledge

Reconstruction of a high-dimensional network may benefit substantially from the inclusion of prior knowledge on the network topology. In the case of gene interaction networks such knowledge may come for instance from pathway repositories like KEGG, or be inferred from data of a pilot study. The Bayesian framework provides a natural means of including such prior knowledge. Based on a Bayesian Simultaneous Equation Model, we develop an appealing Empirical Bayes (EB) procedure that automatically assesses the agreement of the used prior knowledge with the data at hand. We use variational Bayes method for posterior densities approximation and compare its accuracy with that of Gibbs sampling strategy. Our method is computationally fast, and can outperform known competitors. In a simulation study, we show that accurate prior data can greatly improve the reconstruction of the network, but need not harm the reconstruction if wrong. We demonstrate the benefits of the method in an analysis of gene expression data from GEO. In particular, the edges of the recovered network have superior reproducibility (compared to that of competitors) over resampled versions of the data.

[1]  Holger Schwender,et al.  Bibliography Reverse Engineering Genetic Networks Using the Genenet Package , 2006 .

[2]  A. Mohammadi,et al.  Bayesian Structure Learning in Sparse Gaussian Graphical Models , 2012, 1210.5371.

[3]  M. Wand,et al.  Explaining Variational Approximations , 2010 .

[4]  James G. Scott,et al.  The horseshoe estimator for sparse signals , 2010 .

[5]  Larry A. Wasserman,et al.  The huge Package for High-dimensional Undirected Graph Estimation in R , 2012, J. Mach. Learn. Res..

[6]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[7]  Jessika Weiss,et al.  Graphical Models In Applied Multivariate Statistics , 2016 .

[8]  Sach Mukherjee,et al.  Network inference using informative priors , 2008, Proceedings of the National Academy of Sciences.

[9]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[10]  Cengizhan Ozturk,et al.  Bayesian network prior: network analysis of biological data using external knowledge , 2013, Bioinform..

[11]  Ali Shojaie,et al.  Selection and estimation for mixed graphical models. , 2013, Biometrika.

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  K. Kinzler,et al.  Cancer genes and the pathways they control , 2004, Nature Medicine.

[14]  A. W. Vaart,et al.  Transcriptomic Heterogeneity in Cancer as a Consequence of Dysregulation of the Gene–Gene Interaction Network , 2015 .

[15]  Steffen L. Lauritzen,et al.  Graphical models in R , 1996 .

[16]  Genevera I. Allen,et al.  A Local Poisson Graphical Model for Inferring Networks From Sequencing Data , 2013, IEEE Transactions on NanoBioscience.

[17]  P. Nowell The clonal evolution of tumor cell populations. , 1976, Science.

[18]  A. W. van der Vaart,et al.  Bayesian analysis of RNA sequencing data by estimating multiple shrinkage priors. , 2013, Biostatistics.

[19]  S. Wacholder,et al.  Gene Expression Signature of Cigarette Smoking and Its Role in Lung Adenocarcinoma Development and Survival , 2008, PloS one.

[20]  Wessel N. van Wieringen,et al.  Ridge estimation of inverse covariance matrices from high-dimensional data , 2014, Comput. Stat. Data Anal..

[21]  D. Husmeier,et al.  Reconstructing Gene Regulatory Networks with Bayesian Networks by Combining Expression Data with Multiple Sources of Prior Knowledge , 2007, Statistical applications in genetics and molecular biology.

[22]  Liviu Badea,et al.  Combined gene expression analysis of whole-tissue and microdissected pancreatic ductal adenocarcinoma identifies genes specifically overexpressed in tumor epithelia. , 2008, Hepato-gastroenterology.

[23]  Emma Steele,et al.  Literature-based priors for gene regulatory networks , 2009, Bioinform..

[24]  Anders Ellern Bilgrau,et al.  Rags2ridges : Ridge estimation of precision matrices from high-dimensional data , 2017 .

[25]  Shao Li,et al.  Constructing biological networks through combined literature mining and microarray analysis: a LMMA approach , 2006, Bioinform..

[26]  Gwenaël G R Leday,et al.  Gene Network Reconstruction using Global-Local Shrinkage Priors. , 2015, The annals of applied statistics.

[27]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.