Exploiting single-cell expression to characterize co-expression replicability

BackgroundCo-expression networks have been a useful tool for functional genomics, providing important clues about the cellular and biochemical mechanisms that are active in normal and disease processes. However, co-expression analysis is often treated as a black box with results being hard to trace to their basis in the data. Here, we use both published and novel single-cell RNA sequencing (RNA-seq) data to understand fundamental drivers of gene-gene connectivity and replicability in co-expression networks.ResultsWe perform the first major analysis of single-cell co-expression, sampling from 31 individual studies. Using neighbor voting in cross-validation, we find that single-cell network connectivity is less likely to overlap with known functions than co-expression derived from bulk data, with functional variation within cell types strongly resembling that also occurring across cell types. To identify features and analysis practices that contribute to this connectivity, we perform our own single-cell RNA-seq experiment of 126 cortical interneurons in an experimental design targeted to co-expression. By assessing network replicability, semantic similarity and overall functional connectivity, we identify technical factors influencing co-expression and suggest how they can be controlled for. Many of the technical effects we identify are expression-level dependent, making expression level itself highly predictive of network topology. We show this occurs generally through re-analysis of the BrainSpan RNA-seq data.ConclusionsTechnical properties of single-cell RNA-seq data create confounds in co-expression networks which can be identified and explicitly controlled for in any supervised analysis. This is useful both in improving co-expression performance and in characterizing single-cell data in generally applicable terms, permitting cross-laboratory comparison within a common framework.

[1]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[2]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[3]  Greg Finak,et al.  MAST: A flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA-seq data , 2015 .

[4]  Xiaohua Hu,et al.  Dynamic identifying protein functional modules based on adaptive density modularity in protein-protein interaction networks , 2015, BMC Bioinformatics.

[5]  Stephan J. Sanders,et al.  Genotype to phenotype relationships in autism spectrum disorders , 2014, Nature Neuroscience.

[6]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[7]  M. Vidal A Biological Atlas of Functional Maps , 2001, Cell.

[8]  Fabian J. Theis,et al.  destiny: diffusion maps for large-scale single-cell data in R , 2015, Bioinform..

[9]  Elhanan Borenstein,et al.  The discovery of integrated gene networks for autism and related disorders , 2015, Genome research.

[10]  B. Williams,et al.  From single-cell to cell-pool transcriptomes: Stochasticity in gene expression and RNA splicing , 2014, Genome research.

[11]  Atul J. Butte,et al.  Systematic survey reveals general applicability of "guilt-by-association" within gene coexpression networks , 2005, BMC Bioinformatics.

[12]  S. Nelson,et al.  Molecular taxonomy of major neuronal classes in the adult mouse forebrain , 2006, Nature Neuroscience.

[13]  Damian Szklarczyk,et al.  WeGET: predicting new genes for molecular systems by weighted co-expression , 2015, Nucleic Acids Res..

[14]  Sara Ballouz,et al.  Guidance for RNA-seq co-expression network construction and analysis: safety in numbers , 2015, Bioinform..

[15]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[16]  S. Horvath,et al.  Integrative Functional Genomic Analyses Implicate Specific Molecular Pathways and Circuits in Autism , 2013, Cell.

[17]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[18]  I. Amit,et al.  Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types , 2014, Science.

[19]  David P. Kreil,et al.  A comprehensive assessment of RNA-seq accuracy, reproducibility and information content by the Sequencing Quality Control consortium , 2014, Nature Biotechnology.

[20]  Allan R. Jones,et al.  Transcriptional Landscape of the Prenatal Human Brain , 2014, Nature.

[21]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[22]  G. Tseng,et al.  VSNL1 Co-Expression Networks in Aging Include Calcium Signaling, Synaptic Plasticity, and Alzheimer’s Disease Pathways , 2015, Front. Psychiatry.

[23]  Yu-Jin Jung,et al.  Identification of Distinct Tumor Subpopulations in Lung Adenocarcinoma via Single-Cell RNA-seq , 2015, PloS one.

[24]  Alex A. Pollen,et al.  Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signaling pathways in developing cerebral cortex , 2014, Nature Biotechnology.

[25]  Kengo Kinoshita,et al.  COXPRESdb: a database of comparative gene coexpression networks of eleven species for mammals , 2012, Nucleic Acids Res..

[26]  Lin Song,et al.  Comparison of co-expression measures: mutual information, correlation, and model based indices , 2012, BMC Bioinformatics.

[27]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[28]  Paul Pavlidis,et al.  The role of indirect connections in gene networks in predicting function , 2011, Bioinform..

[29]  Rona S. Gertner,et al.  Single cell RNA Seq reveals dynamic paracrine control of cellular variation , 2014, Nature.

[30]  Siguang Li,et al.  Identification and functional analysis of long non-coding RNAs in mouse cleavage stage embryonic development based on single cell transcriptome data , 2014, BMC Genomics.

[31]  Xiang Wan,et al.  Gemma: a resource for the reuse, sharing and meta-analysis of expression profiling data , 2012, Bioinform..

[32]  C. Ponting,et al.  G&T-seq: parallel sequencing of single-cell genomes and transcriptomes , 2015, Nature Methods.

[33]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[34]  H. Taniguchi Genetic dissection of GABAergic neural circuits in mouse neocortex , 2014, Front. Cell. Neurosci..

[35]  Barbara J. Wold,et al.  A ratiometric-based measure of gene co-expression , 2014, BMC Bioinformatics.

[36]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[37]  H. Taniguchi,et al.  The Spatial and Temporal Origin of Chandelier Cells in Mouse Neocortex , 2013, Science.

[38]  Jun Dong,et al.  Geometric Interpretation of Gene Coexpression Network Analysis , 2008, PLoS Comput. Biol..

[39]  Jinyan Li,et al.  B-cell epitope prediction through a graph model , 2012, BMC Bioinformatics.

[40]  S. Arber,et al.  A Developmental Switch in the Response of DRG Neurons to ETS Transcription Factor Signaling , 2005, PLoS biology.

[41]  Jesse Gillis,et al.  The Impact of Multifunctional Genes on "Guilt by Association" Analysis , 2011, PloS one.

[42]  Mingxiang Teng,et al.  On the widespread and critical impact of systematic bias and batch effects in single-cell RNA-Seq data , 2015 .

[43]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[44]  Rona S. Gertner,et al.  Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells , 2013, Nature.

[45]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[46]  S. Horvath,et al.  Genetic programs in human and mouse early embryos revealed by single-cell RNA sequencing , 2013, Nature.

[47]  David Warde-Farley,et al.  GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function , 2008, Genome Biology.

[48]  Terence P. Speed,et al.  Systematic noise degrades gene co-expression signals but can be corrected , 2015, BMC Bioinformatics.

[49]  Sara Ballouz,et al.  Measuring the wisdom of the crowds in network-based gene function inference , 2015, Bioinform..

[50]  Doheon Lee,et al.  Differential activation of immune/inflammatory response-related co-expression modules in the hippocampus across the major psychiatric disorders , 2016, Molecular Psychiatry.

[51]  S. Horvath,et al.  Single-Cell Transcriptome Analyses Reveal Signals to Activate Dormant Neural Stem Cells , 2015, Cell.

[52]  Andrew E. Jaffe,et al.  Erratum to: Practical impacts of genomic data “cleaning” on biological discovery using surrogate variable analysis , 2015, BMC Bioinformatics.

[53]  L. Siever,et al.  Spatial and Temporal Mapping of De Novo Mutations in Schizophrenia to a Fetal Prefrontal Cortical Network , 2013, Cell.

[54]  Hans Clevers,et al.  Single-cell messenger RNA sequencing reveals rare intestinal cell types , 2015, Nature.

[55]  Rebecca F. Halperin,et al.  GuiTope: an application for mapping random-sequence peptides to protein sequences , 2012, BMC Bioinformatics.

[56]  S. Quake,et al.  A survey of human brain transcriptome diversity at the single cell level , 2015, Proceedings of the National Academy of Sciences.

[57]  Boris Yamrom,et al.  The contribution of de novo coding mutations to autism spectrum disorder , 2014, Nature.

[58]  Aleksandra A. Kolodziejczyk,et al.  Classification of low quality cells from single-cell RNA-seq data , 2016, Genome Biology.

[59]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[60]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[61]  Allan R. Jones,et al.  A robust and high-throughput Cre reporting and characterization system for the whole mouse brain , 2009, Nature Neuroscience.

[62]  Wei Niu,et al.  Coexpression Networks Implicate Human Midfetal Deep Cortical Projection Neurons in the Pathogenesis of Autism , 2013, Cell.

[63]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[64]  M. Reinders,et al.  Genome-wide coexpression of steroid receptors in the mouse brain: Identifying signaling pathways and functionally coordinated regions , 2016, Proceedings of the National Academy of Sciences.

[65]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[66]  C. Tyler-Smith,et al.  Ancient DNA and the rewriting of human history: be sparing with Occam’s razor , 2016, Genome Biology.