A Local Poisson Graphical Model for Inferring Networks From Sequencing Data

Gaussian graphical models, a class of undirected graphs or Markov Networks, are often used to infer gene networks based on microarray expression data. Many scientists, however, have begun using high-throughput sequencing technologies such as RNA-sequencing or next generation sequencing to measure gene expression. As the resulting data consists of counts of sequencing reads for each gene, Gaussian graphical models are not optimal for this discrete data. In this paper, we propose a novel method for inferring gene networks from sequencing data: the Local Poisson Graphical Model. Our model assumes a Local Markov property where each variable conditional on all other variables is Poisson distributed. We develop a neighborhood selection algorithm to fit our model locally by performing a series of l1 penalized Poisson, or log-linear, regressions. This yields a fast parallel algorithm for estimating networks from next generation sequencing data. In simulations, we illustrate the effectiveness of our methods for recovering network structure from count data. A case study on breast cancer microRNAs (miRNAs), a novel application of graphical models, finds known regulators of breast cancer genes and discovers novel miRNA clusters and hubs that are targets for future research.

[1]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[2]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[3]  J. Lieberman,et al.  let-7 Regulates Self Renewal and Tumorigenicity of Breast Cancer Cells , 2007, Cell.

[4]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[5]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[6]  Larry A. Wasserman,et al.  The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs , 2009, J. Mach. Learn. Res..

[7]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[8]  D. Karlis An EM algorithm for multivariate Poisson distribution and related models , 2003 .

[9]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[10]  Steffen L. Lauritzen,et al.  Graphical models in R , 1996 .

[11]  Robert A. Weinberg,et al.  Therapeutic silencing of miR-10b inhibits metastasis in a mouse mammary tumor model , 2010, Nature Biotechnology.

[12]  David M. Simcha,et al.  Tackling the widespread and critical impact of batch effects in high-throughput data , 2010, Nature Reviews Genetics.

[13]  Anne-Laure Boulesteix,et al.  Regularized estimation of large-scale gene association networks using graphical Gaussian models , 2009, BMC Bioinformatics.

[14]  A. Zellner An Efficient Method of Estimating Seemingly Unrelated Regressions and Tests for Aggregation Bias , 1962 .

[15]  David Madigan,et al.  A graphical characterization of lattice conditional independence models , 2004, Annals of Mathematics and Artificial Intelligence.

[16]  Wing Hung Wong,et al.  Statistical inferences for isoform expression in RNA-Seq , 2009, Bioinform..

[17]  R. Aharonov,et al.  Identification of hundreds of conserved and nonconserved human microRNAs , 2005, Nature Genetics.

[18]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[19]  P. Holland,et al.  Discrete Multivariate Analysis. , 1976 .

[20]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the lasso , 2007, 0708.3517.

[21]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumours , 2013 .

[22]  Larry A. Wasserman,et al.  The Nonparanormal SKEPTIC , 2012, ICML 2012.

[23]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[24]  H. Zou,et al.  Regularized rank-based estimation of high-dimensional nonparanormal graphical models , 2012, 1302.3082.

[25]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[26]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[27]  D. Bartel MicroRNAs Genomics, Biogenesis, Mechanism, and Function , 2004, Cell.

[28]  W. Huber,et al.  Differential expression analysis for sequence count data , 2010 .

[29]  Pradeep Ravikumar,et al.  Graphical Models via Generalized Linear Models , 2012, NIPS.

[30]  S. Srivastava,et al.  A two-parameter generalized Poisson model to improve the analysis of RNA-seq data , 2010, Nucleic acids research.

[31]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[32]  M. Stephens,et al.  RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. , 2008, Genome research.

[33]  B. Williams,et al.  Mapping and quantifying mammalian transcriptomes by RNA-Seq , 2008, Nature Methods.

[34]  Larry A. Wasserman,et al.  Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models , 2010, NIPS.

[35]  J. M. Hammersley,et al.  Markov fields on finite graphs and lattices , 1971 .

[36]  Steven J. M. Jones,et al.  Comprehensive molecular portraits of human breast tumors , 2012, Nature.

[37]  A. Dobra,et al.  Copula Gaussian graphical models and their application to modeling functional disability data , 2011, 1108.1680.

[38]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[39]  Maria Kafousi,et al.  MicroRNA expression analysis in triple-negative (ER, PR and Her2/neu) breast cancer , 2011, Cell cycle.

[40]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[41]  M. Robinson,et al.  A scaling normalization method for differential expression analysis of RNA-seq data , 2010, Genome Biology.

[42]  Michael I. Jordan,et al.  Graphical Models, Exponential Families, and Variational Inference , 2008, Found. Trends Mach. Learn..

[43]  Ali Jalali,et al.  On Learning Discrete Graphical Models using Group-Sparse Regularization , 2011, AISTATS.

[44]  L. J.,et al.  Normalization , testing , and false discovery rate estimation for RNA-sequencing data , 2012 .

[45]  C. Croce,et al.  Epigenetically deregulated microRNA-375 is involved in a positive feedback loop with estrogen receptor alpha in breast cancer cells. , 2010, Cancer research.