GRAPHICAL MODELS FOR ZERO-INFLATED SINGLE CELL GENE EXPRESSION.

Bulk gene expression experiments relied on aggregations of thousands of cells to measure the average expression in an organism. Advances in microfluidic and droplet sequencing now permit expression profiling in single cells. This study of cell-to-cell variation reveals that individual cells lack detectable expression of transcripts that appear abundant on a population level, giving rise to zero-inflated expression patterns. To infer gene co-regulatory networks from such data, we propose a multivariate Hurdle model. It is comprised of a mixture of singular Gaussian distributions. We employ neighborhood selection with the pseudo-likelihood and a group lasso penalty to select and fit undirected graphical models that capture conditional independences between genes. The proposed method is more sensitive than existing approaches in simulations, even under departures from our Hurdle model. The method is applied to data for T follicular helper cells, and a high-dimensional profile of mouse dendritic cells. It infers network structure not revealed by other methods; or in bulk data sets. An R implementation is available at https://github.com/amcdavid/HurdleNormal.

[1]  Marloes H. Maathuis,et al.  Structure Learning in Graphical Modeling , 2016, 1606.02359.

[2]  S. Jackson,et al.  Gene Networks in Plant Biology: Approaches in Reconstruction and Analysis. , 2015, Trends in plant science.

[3]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[4]  Pradeep Ravikumar,et al.  Vector-Space Markov Random Fields via Exponential Families , 2015, ICML.

[5]  Greg Finak,et al.  COMPASS identifies T-cell subsets correlated with clinical outcomes , 2015, Nature Biotechnology.

[6]  Ali Shojaie,et al.  Selection and estimation for mixed graphical models. , 2013, Biometrika.

[7]  Rona S. Gertner,et al.  Single cell RNA Seq reveals dynamic paracrine control of cellular variation , 2014, Nature.

[8]  B. Williams,et al.  From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing , 2014, Genome research.

[9]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[10]  Pradeep Ravikumar,et al.  Mixed Graphical Models via Exponential Families , 2014, AISTATS.

[11]  M. Clément,et al.  Regulation of T follicular helper cells by CD8+ regulatory T cells reduces pro-atherogenict tertiary lymphoid organ formation in apolipoprotein E KO mice , 2013 .

[12]  Tianxi Li,et al.  High-Dimensional Mixed Graphical Models , 2013, 1304.2810.

[13]  Andrew McDavid,et al.  Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments , 2012, Bioinform..

[14]  Trevor J. Hastie,et al.  Structure Learning of Mixed Graphical Models , 2013, AISTATS.

[15]  J. Marioni,et al.  Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data , 2013, Genome Biology.

[16]  S. Tangye,et al.  The origins, function, and regulation of T follicular helper cells , 2012, The Journal of experimental medicine.

[17]  Robert Tibshirani,et al.  STANDARDIZATION AND THE GROUP LASSO PENALTY. , 2012, Statistica Sinica.

[18]  Yoshiteru Kagawa,et al.  Fatty acid-binding protein 4 (FABP4) and FABP5 modulate cytokine production in the mouse thymic epithelial cells , 2012, Histochemistry and Cell Biology.

[19]  R. Tibshirani,et al.  Strong rules for discarding predictors in lasso‐type problems , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[20]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[21]  Rajen Dinesh Shah,et al.  Variable selection with error control: another look at stability selection , 2011, 1105.5578.

[22]  Julio Collado-Vides,et al.  RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units) , 2010, Nucleic Acids Res..

[23]  M. Drton,et al.  Exact block-wise optimization in group lasso and sparse group lasso for linear regression , 2010, 1010.3320.

[24]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[25]  G. Baier,et al.  NFAT pulls the strings during CD4+ T helper cell effector functions. , 2010, Blood.

[26]  Kevin A. Janes,et al.  Identifying single-cell molecular programs by stochastic profiling , 2010, Nature Methods.

[27]  Burton E. Barnett,et al.  Bcl6 and Blimp-1 Are Reciprocal and Antagonistic Regulators of T Follicular Helper Cell Differentiation , 2009, Science.

[28]  Larry A. Wasserman,et al.  The Nonparanormal: Semiparametric Estimation of High Dimensional Undirected Graphs , 2009, J. Mach. Learn. Res..

[29]  Seth Sullivant,et al.  Lectures on Algebraic Statistics , 2008 .

[30]  Rainer Spang,et al.  Inferring cellular networks – a review , 2007, BMC Bioinformatics.

[31]  Mario Roederer,et al.  Immunization with vaccinia virus induces polyfunctional and phenotypically distinctive CD8+ T cell responses , 2007, The Journal of experimental medicine.

[32]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[33]  Te-Won Lee,et al.  On the multivariate Laplace distribution , 2006, IEEE Signal Processing Letters.

[34]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[35]  L. Pham,et al.  Constitutive NF-kappaB and NFAT activation in aggressive B-cell lymphomas synergistically activates the CD154 gene and maintains lymphoma cell survival. , 2005, Blood.

[36]  M. West,et al.  Sparse graphical models for exploring gene expression data , 2004 .

[37]  Yuetsu Tanaka,et al.  Microbial Compounds Selectively Induce Th1 Cell-Promoting or Th2 Cell-Promoting Dendritic Cells In Vitro with Diverse Th Cell-Polarizing Signals1 , 2002, The Journal of Immunology.

[38]  Michael I. Jordan Graphical Models , 2003 .

[39]  N. L. Johnson Linear Statistical Inference and Its Applications , 1966 .