Graphical models via univariate exponential family distributions

Undirected graphical models, or Markov networks, are a popular class of statistical models, used in a wide variety of applications. Popular instances of this class include Gaussian graphical models and Ising models. In many settings, however, it might not be clear which subclass of graphical models to use, particularly for non-Gaussian and non-categorical data. In this paper, we consider a general sub-class of graphical models where the node-wise conditional distributions arise from exponential families. This allows us to derive multivariate graphical model distributions from univariate exponential family distributions, such as the Poisson, negative binomial, and exponential distributions. Our key contributions include a class of M-estimators to fit these graphical model distributions; and rigorous statistical analysis showing that these M-estimators recover the true graphical model structure exactly, with high probability. We provide examples of genomic and proteomic networks learned via instances of our class of graphical models derived from Poisson and exponential distributions.

[1]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[2]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[3]  Ali Jalali,et al.  On Learning Discrete Graphical Models using Group-Sparse Regularization , 2011, AISTATS.

[4]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[5]  Martin A. Nowak,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004 .

[6]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[7]  T. Speed,et al.  Gaussian Markov Distributions over Finite Graphs , 1986 .

[8]  A. Dobra,et al.  Copula Gaussian graphical models and their application to modeling functional disability data , 2011, 1108.1680.

[9]  Pradeep Ravikumar,et al.  Graphical Models via Generalized Linear Models , 2012, NIPS.

[10]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[11]  Martin J. Wainwright,et al.  Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using $\ell _{1}$ -Constrained Quadratic Programming (Lasso) , 2009, IEEE Transactions on Information Theory.

[12]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[13]  Marc Teboulle,et al.  Gradient-based algorithms with applications to signal-recovery problems , 2010, Convex Optimization in Signal Processing and Communications.

[14]  L. J.,et al.  Normalization , testing , and false discovery rate estimation for RNA-sequencing data , 2012 .

[15]  I. Daubechies,et al.  Accelerated Projected Gradient Method for Linear Inverse Problems with Sparsity Constraints , 2007, 0706.4297.

[16]  Larry A. Wasserman,et al.  Sparse Nonparametric Graphical Models , 2012, ArXiv.

[17]  R. Tibshirani,et al.  Monographs on statistics and applied probability , 1990 .

[18]  B Peter Statistics for High-Dimensional Data: Selected Topics , 2014 .

[19]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[20]  N. Reid,et al.  AN OVERVIEW OF COMPOSITE LIKELIHOOD METHODS , 2011 .

[21]  Hongzhe Li,et al.  A Markov random field model for network-based analysis of genomic data , 2007, Bioinform..

[22]  H. Zou,et al.  Regularized rank-based estimation of high-dimensional nonparanormal graphical models , 2012, 1302.3082.

[23]  Larry A. Wasserman,et al.  The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs , 2009, J. Mach. Learn. Res..

[24]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[25]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[26]  P. McCullagh,et al.  Generalized Linear Models , 1992 .

[27]  Kotb Abdelmohsen,et al.  miR-519 suppresses tumor growth by reducing HuR levels , 2010, Cell cycle.

[28]  D. Acemoglu THE CRISIS OF 2008: LESSONS FOR AND FROM ECONOMICS , 2009 .

[29]  P. Holland,et al.  Discrete Multivariate Analysis. , 1976 .

[30]  Peter Clifford,et al.  Markov Random Fields in Statistics , 2012 .

[31]  Genevera I. Allen,et al.  A Log-Linear Graphical Model for inferring genetic networks from high-throughput sequencing data , 2012, 2012 IEEE International Conference on Bioinformatics and Biomedicine.

[32]  R. Tibshirani,et al.  A SIGNIFICANCE TEST FOR THE LASSO. , 2013, Annals of statistics.

[33]  Michael I. Jordan Graphical Models , 1998 .

[34]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[35]  G. Grimmett,et al.  Disorder in physical systems : a volume in honour of John M. Hammersley on the occasion of his 70th birthday , 1990 .

[36]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[37]  C. Varin,et al.  A note on composite likelihood inference and model selection , 2005 .

[38]  RavikumarPradeep,et al.  Graphical models via univariate exponential family distributions , 2015 .

[39]  Nir Friedman,et al.  Inferring subnetworks from perturbed expression profiles , 2001, ISMB.

[40]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[41]  J. Lieberman,et al.  let-7 Regulates Self Renewal and Tumorigenicity of Breast Cancer Cells , 2007, Cell.

[42]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[43]  R. Tibshirani A signicance test for the lasso , 2014 .

[44]  I. Keklikoglou,et al.  MicroRNA-520/373 family functions as a tumor suppressor in estrogen receptor negative breast cancer by targeting NF-κB and TGF-β signaling pathways , 2012, Oncogene.

[45]  Leonore A Herzenberg,et al.  Interpreting flow cytometry data: a guide for the perplexed , 2006, Nature Immunology.

[46]  Genevera I. Allen,et al.  A Local Poisson Graphical Model for Inferring Networks From Sequencing Data , 2013, IEEE Transactions on NanoBioscience.

[47]  Larry A. Wasserman,et al.  High Dimensional Semiparametric Gaussian Copula Graphical Models. , 2012, ICML 2012.

[48]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[49]  Larry A. Wasserman,et al.  The Nonparanormal SKEPTIC , 2012, ICML 2012.

[50]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the lasso , 2007, 0708.3517.