Joint estimation of multiple networks from time course data

Graphical models are widely used to make inferences concerning interplay in multivariate systems. In many applications, data are collected from multiple related but nonidentical units whose underlying networks may differ but are likely to share features. Here we present a hierarchical Bayesian formulation for joint estimation of multiple networks in this nonidentically distributed setting. The approach is general: given a suitable class of graphical models, it uses an exchangeability assumption on networks to provide a corresponding joint formulation. Motivated by emerging experimental designs in molecular biology, we focus on time-course data with interventions, using dynamic Bayesian networks as the graphical models. We introduce a computationally efficient, deterministic algorithm for exact joint inference in this setting. We provide an upper bound on the gains that joint estimation offers relative to separate estimation for each network and empirical results that support and extend the theory, including an extensive simulation study and an application to proteomic data from human cancer cell lines. Finally, we describe approximations that are still more computationally efficient than the exact algorithm and that also demonstrate good empirical performance.

[1]  Constantin F. Aliferis,et al.  Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation , 2010, J. Mach. Learn. Res..

[2]  Dirk Husmeier,et al.  Non-homogeneous dynamic Bayesian networks with Bayesian regularization for inferring gene regulatory networks with gradually time-varying structure , 2012, Machine Learning.

[3]  K. Coombes,et al.  A Technical Assessment of the Utility of Reverse Phase Protein Arrays for the Study of the Functional Proteome in Non-microdissected Human Breast Cancers , 2010, Clinical Proteomics.

[4]  Dirk Husmeier,et al.  Gene Regulatory Network Reconstruction by Bayesian Integration of Prior Knowledge and/or Different Experimental Conditions , 2008, J. Bioinform. Comput. Biol..

[5]  Rich Caruana,et al.  Inductive Transfer for Bayesian Network Structure Learning , 2007, ICML Unsupervised and Transfer Learning.

[6]  Sach Mukherjee,et al.  Network inference using informative priors , 2008, Proceedings of the National Academy of Sciences.

[7]  C J Oates,et al.  Network Inference and Biological Dynamics. , 2011, The annals of applied statistics.

[8]  Wen-Lin Kuo,et al.  A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. , 2006, Cancer cell.

[9]  Robert Kohn,et al.  Nonparametric regression using linear combinations of basis functions , 2001, Stat. Comput..

[10]  M. J. Bayarri,et al.  Criteria for Bayesian model choice with application to variable selection , 2012, 1209.5240.

[11]  M. Girolami,et al.  Inferring Signaling Pathway Topologies from Multiple Perturbation Measurements of Specific Biochemical Species , 2010, Science Signaling.

[12]  Terran Lane,et al.  Bayesian Discovery of Multiple Bayesian Networks via Transfer Learning , 2013, 2013 IEEE 13th International Conference on Data Mining.

[13]  Abel Rodríguez,et al.  Sparse covariance estimation in heterogeneous samples. , 2010, Electronic journal of statistics.

[14]  Christopher A. Penfold,et al.  Nonparametric Bayesian inference for perturbed and orthologous gene regulatory networks , 2012, Bioinform..

[15]  Terence P. Speed,et al.  Bayesian Inference of Signaling Network Topology in a Cancer Cell Line , 2012, Bioinform..

[16]  Sach Mukherjee,et al.  Network-based clustering with mixtures of L1-penalized Gaussian graphical models: an empirical investigation , 2013, ArXiv.

[17]  Sach Mukherjee,et al.  Network clustering: probing biological heterogeneity by sparse graphical models , 2011, Bioinform..

[18]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[19]  Le Song,et al.  Time-Varying Dynamic Bayesian Networks , 2009, NIPS.

[20]  B. Maher ENCODE: The human encyclopaedia , 2012, Nature.

[21]  Tom Heskes,et al.  A Bayesian Approach to Constraint Based Causal Inference , 2012, UAI.

[22]  J. Pearl Causal inference in statistics: An overview , 2009 .

[23]  Peter D. Hoff,et al.  A hierarchical eigenmodel for pooled covariance estimation , 2008, 0804.0031.

[24]  Darren J. Wilkinson,et al.  Bayesian integration of networks without gold standards , 2012, Bioinform..

[25]  F. Markowetz,et al.  The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups , 2012, Nature.

[26]  Mingming Jia,et al.  COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer , 2010, Nucleic Acids Res..

[27]  Satoru Miyano,et al.  Error tolerant model for incorporating biological knowledge with expression data in estimating gene networks , 2006 .

[28]  Marco Grzegorczyk,et al.  Improvements in the reconstruction of time-varying gene regulatory networks: dynamic programming and regularization by information sharing among genes , 2011, Bioinform..

[29]  J. Ibrahim,et al.  Power prior distributions for regression models , 2000 .

[30]  James G. Scott,et al.  Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem , 2010, 1011.2333.

[31]  Prahlad T. Ram,et al.  A pan-cancer proteomic perspective on The Cancer Genome Atlas , 2014, Nature Communications.

[32]  Dean P. Foster,et al.  Calibration and empirical Bayes variable selection , 2000 .

[33]  Arnaud Doucet,et al.  On the Utility of Graphics Cards to Perform Massively Parallel Simulation of Advanced Monte Carlo Methods , 2009, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[34]  Tsuyoshi Kato,et al.  Selective integration of multiple biological data for supervised network inference , 2005, Bioinform..

[35]  Wei Pan,et al.  Bayesian Joint Modeling of Multiple Gene Networks and Diverse Genomic Data to Identify Target Genes of a Transcription Factor. , 2012, The annals of applied statistics.

[36]  K. Borgwardt,et al.  Whole-genome sequencing of multiple Arabidopsis thaliana populations , 2011, Nature Genetics.

[37]  Hiroyuki Konishi,et al.  The PIK3CA gene is mutated with high frequency in human breast cancers , 2004, Cancer biology & therapy.

[38]  J. Berger,et al.  Optimal predictive model selection , 2004, math/0406464.

[39]  David R. Hunter,et al.  Model-Based Clustering of Large Networks , 2012, The annals of applied statistics.

[40]  Satoru Miyano,et al.  Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[41]  Patrick Danaher,et al.  The joint graphical lasso for inverse covariance estimation across multiple classes , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[42]  Adam A. Margolin,et al.  The Cancer Cell Line Encyclopedia enables predictive modeling of anticancer drug sensitivity , 2012, Nature.

[43]  A. Nicholson,et al.  Mutations of the BRAF gene in human cancer , 2002, Nature.

[44]  Xiaotong Shen,et al.  Penalized model-based clustering with unconstrained covariance matrices. , 2009, Electronic journal of statistics.

[45]  Kevin P. Murphy,et al.  Exact Bayesian structure learning from uncertain interventions , 2007, AISTATS.

[46]  J. Pearl Why there is no statistical test for confounding, why many think there is, and why they are almost right , 1998 .

[47]  Peter Wonka,et al.  Fused Multiple Graphical Lasso , 2012, SIAM J. Optim..

[48]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[49]  M. Clyde,et al.  Mixtures of g Priors for Bayesian Variable Selection , 2008 .