Learning Modular Structures from Network Data and Node Variables

A standard technique for understanding underlying dependency structures among a set of variables posits a shared conditional probability distribution for the variables measured on individuals within a group. This approach is often referred to as module networks, where individuals are represented by nodes in a network, groups are termed modules, and the focus is on estimating the network structure among modules. However, estimation solely from node-specific variables can lead to spurious dependencies, and unverifiable structural assumptions are often used for regularization. Here, we propose an extended model that leverages direct observations about the network in addition to node-specific variables. By integrating complementary data types, we avoid the need for structural assumptions. We illustrate theoretical and practical significance of the model and develop a reversible-jump MCMC learning procedure for learning modules and model parameters. We demonstrate the method accuracy in predicting modular structures from synthetic data and capability to learn influence structures in twitter data and regulatory modules in the Mycobacterium tuberculosis gene regulatory network.

[1]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[2]  Xiaodong Lin,et al.  MULTI-WAY BLOCKMODELS FOR ANALYZING COORDINATED HIGH-DIMENSIONAL RESPONSES. , 2013, The annals of applied statistics.

[3]  Tommi S. Jaakkola,et al.  Physical Network Models , 2004, J. Comput. Biol..

[4]  Dirk Schnappinger,et al.  Inhibition of Respiration by Nitric Oxide Induces a Mycobacterium tuberculosis Dormancy Program , 2003, The Journal of experimental medicine.

[5]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[6]  C. Matias,et al.  Identifiability of parameters in latent structure models with many observed variables , 2008, 0809.5032.

[7]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[8]  T. Elston,et al.  Stochasticity in gene expression: from theories to phenotypes , 2005, Nature Reviews Genetics.

[9]  Edoardo M. Airoldi,et al.  A Survey of Statistical Network Models , 2009, Found. Trends Mach. Learn..

[10]  Martin Tompa,et al.  Rv3133c/dosR is a transcription factor that mediates the hypoxic response of Mycobacterium tuberculosis , 2003, Molecular microbiology.

[11]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[12]  Elham Azizi,et al.  Joint Learning of Modular Structures from Multiple Data Types , 2013 .

[13]  Andreas Nitsche,et al.  Erratum , 1984, Clinical Neurology and Neurosurgery.

[14]  R. Kozinets E-tribalized Marketing?: The Strategic Implications of Virtual Communities of Consumption , 1999 .

[15]  David C. Parkes,et al.  Generalized Method-of-Moments for Rank Aggregation , 2013, NIPS.

[16]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[17]  Jure Leskovec,et al.  Patterns of temporal variation in online media , 2011, WSDM '11.

[18]  Tige R. Rustad,et al.  The Enduring Hypoxic Response of Mycobacterium tuberculosis , 2008, PloS one.

[19]  M. Gerstein,et al.  Unlocking the secrets of the genome , 2009, Nature.

[20]  Anima Anandkumar,et al.  A Method of Moments for Mixture Models and Hidden Markov Models , 2012, COLT.

[21]  Kathleen Marchal,et al.  Validating module network learning algorithms using simulated data , 2007, BMC Bioinformatics.

[22]  Edoardo M. Airoldi,et al.  Mixed Membership Stochastic Blockmodels , 2007, NIPS.

[23]  Nir Friedman,et al.  Learning Module Networks , 2002, J. Mach. Learn. Res..

[24]  Nan Qiao,et al.  A novel Bayesian network inference algorithm for integrative analysis of heterogeneous deep sequencing data , 2013, Cell Research.

[25]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[26]  Edoardo M. Airoldi,et al.  Graphlet decomposition of a weighted network , 2012, AISTATS.

[27]  Edoardo M. Airoldi,et al.  Stochastic blockmodel approximation of a graphon: Theory and consistent estimation , 2013, NIPS.

[28]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[29]  Arun Sundararajan,et al.  Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks , 2009, Proceedings of the National Academy of Sciences.

[30]  P. Latouche,et al.  Overlapping stochastic block models with application to the French political blogosphere , 2009, 0910.2098.

[31]  Kathleen Marchal,et al.  Module networks revisited: computational assessment and prioritization of model predictions , 2009, Bioinform..

[32]  Ben Sidders,et al.  A highly conserved transcriptional repressor controls a large regulon involved in lipid degradation in Mycobacterium smegmatis and Mycobacterium tuberculosis , 2007, Molecular microbiology.

[33]  Satoru Miyano,et al.  Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[34]  T. Ideker,et al.  Integrative approaches for finding modular structure in biological networks , 2013, Nature Reviews Genetics.

[35]  John Chan,et al.  Tuberculosis: Latency and Reactivation , 2001, Infection and Immunity.

[36]  T. Jaakkola,et al.  Modeling the Combinatorial Functions of Multiple Transcription Factors , 2005, RECOMB.

[37]  David C. Parkes,et al.  Computing Parametric Ranking Models via Rank-Breaking , 2014, ICML.

[38]  Edoardo M. Airoldi,et al.  Getting Started in Probabilistic Graphical Models , 2007, PLoS Comput. Biol..

[39]  Yuan Qi,et al.  Modularity and Dynamics of Cellular Networks , 2006, PLoS Comput. Biol..

[40]  D. Husmeier,et al.  Reconstructing Gene Regulatory Networks with Bayesian Networks by Combining Expression Data with Multiple Sources of Prior Knowledge , 2007, Statistical applications in genetics and molecular biology.

[41]  Christian Stolte,et al.  TB database: an integrated platform for tuberculosis research , 2008, Nucleic Acids Res..

[42]  Zenglin Xu,et al.  Sparse matrix-variate Gaussian process blockmodels for network modeling , 2011, UAI.

[43]  T. Snijders,et al.  Estimation and Prediction for Stochastic Blockmodels for Graphs with Latent Block Structure , 1997 .

[44]  Edoardo M. Airoldi,et al.  Stochastic blockmodels with growing number of classes , 2010, Biometrika.

[45]  Yves Van de Peer,et al.  The Mycobacterium tuberculosis regulatory network and hypoxia , 2013, Nature.

[46]  Yuchung J. Wang,et al.  Stochastic Blockmodels for Directed Graphs , 1987 .

[47]  S. Mangan,et al.  Structure and function of the feed-forward loop network motif , 2003, Proceedings of the National Academy of Sciences of the United States of America.