Variational Inference of Sparse Network from Count Data

The problem of network reconstruction from continuous data has been extensively studied and most state of the art methods rely on variants of Gaussian Graphical Models (GGM). GGM are unfortunately badly suited to sparse count data spanning several orders of magnitude. Most inference methods for count data (SparCC, REBACCA, SPIEC-EASI, gCoda, etc) first transform counts to pseudo-Gaussian observations before using GGM. We rely instead on a PoissonLogNormal (PLN) model where counts follow Poisson distributions with parameters sampled from a latent multivariate Gaussian variable, and infer the network in the latent space using a variational inference procedure. This model allows us to (i) control for confounding covariates and differences in sampling efforts and (ii) integrate data sets from different origins. It is also competitive in terms of speed and accuracy with state of the art methods.

[1]  Nathalie Villa-Vialaneix,et al.  Multiple hot‐deck imputation for network inference from RNA sequencing data , 2018, Bioinform..

[2]  S. Aerts,et al.  Mapping gene regulatory networks from single-cell omics data , 2018, Briefings in functional genomics.

[3]  Curtis Huttenhower,et al.  A Bayesian method for detecting pairwise associations in compositional data , 2017, PLoS Comput. Biol..

[4]  Christian L. Müller,et al.  Identifying direct contacts between protein complex subunits from their conditional dependence in proteomics datasets , 2017, PLoS Comput. Biol..

[5]  HUAYING FANG,et al.  gCoda: Conditional Dependence Network Inference for Compositional Data , 2017, J. Comput. Biol..

[6]  Pradeep Ravikumar,et al.  A review of multivariate distributions for count data derived from the Poisson distribution , 2016, Wiley interdisciplinary reviews. Computational statistics.

[7]  David J. Harris,et al.  Inferring species interactions from co-occurrence data with Markov networks. , 2016, Ecology.

[8]  Loïc Schwaller,et al.  Deciphering the Pathobiome: Intra- and Interkingdom Interactions Involving the Pathogen Erysiphe alphitoides , 2016, Microbial Ecology.

[9]  Vladimir Jojic,et al.  Learning Microbial Interaction Networks from Metagenomic Count Data , 2014, J. Comput. Biol..

[10]  Alireza Tamaddoni-Nezhad,et al.  Learning ecological networks from next-generation sequencing data , 2016 .

[11]  David J. Harris Inferring species interactions from co-occurrence data with Markov networks , 2015, bioRxiv.

[12]  Xiangtian Yu,et al.  Unravelling personalized dysfunctional gene network of complex diseases based on differential network model , 2015, Journal of Translational Medicine.

[13]  Peer Bork,et al.  Determinants of community structure in the global plankton interactome , 2015, Science.

[14]  Fabian J Theis,et al.  Decoding the Regulatory Network for Blood Development from Single-Cell Gene Expression Measurements , 2015, Nature Biotechnology.

[15]  Christian L. Müller,et al.  Sparse and Compositionally Robust Inference of Microbial Ecological Networks , 2014, PLoS Comput. Biol..

[16]  Andrea Rau,et al.  A Hierarchical Poisson Log-Normal Model for Network Inference from RNA Sequencing Data , 2013, PloS one.

[17]  Pradeep Ravikumar,et al.  Graphical Models via Generalized Linear Models , 2012, NIPS.

[18]  Jonathan Friedman,et al.  Inferring Correlation Networks from Genomic Survey Data , 2012, PLoS Comput. Biol..

[19]  Genevera I. Allen,et al.  A Log-Linear Graphical Model for inferring genetic networks from high-throughput sequencing data , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[20]  Mátyás A. Sustik,et al.  GLASSOFAST : An efficient GLASSO implementation , 2012 .

[21]  Rina Foygel,et al.  Extended Bayesian Information Criteria for Gaussian Graphical Models , 2010, NIPS.

[22]  Pablo A. Parrilo,et al.  Latent variable graphical model selection via convex optimization , 2010, 2010 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[23]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[24]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[25]  Larry A. Wasserman,et al.  The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs , 2009, J. Mach. Learn. Res..

[26]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[27]  Jiahua Chen,et al.  Extended Bayesian information criteria for model selection with large model spaces , 2008 .

[28]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[29]  Paul Damien,et al.  A multivariate Poisson-lognormal regression model for prediction of crash counts by severity, using Bayesian methods. , 2008, Accident; analysis and prevention.

[30]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[31]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[32]  Eun Sug Park,et al.  Multivariate Poisson-Lognormal Models for Jointly Modeling Crash Frequency by Severity , 2007 .

[33]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[34]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[35]  D. Karlis EM Algorithm for Mixed Poisson and Other Discrete Distributions , 2005, ASTIN Bulletin.

[36]  S. Chib,et al.  Understanding the Metropolis-Hastings Algorithm , 1995 .

[37]  A. Agresti An introduction to categorical data analysis , 1997 .

[38]  J. Aitchison,et al.  The multivariate Poisson-log normal distribution , 1989 .

[39]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[40]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .