Characterizing the replicability of cell types defined by single cell RNA-sequencing data using MetaNeighbor

Single-cell RNA-sequencing (scRNA-seq) technology provides a new avenue to discover and characterize cell types; however, the experiment-specific technical biases and analytic variability inherent to current pipelines may undermine its replicability. Meta-analysis is further hampered by the use of ad hoc naming conventions. Here we demonstrate our replication framework, MetaNeighbor, that quantifies the degree to which cell types replicate across datasets, and enables rapid identification of clusters with high similarity. We first measure the replicability of neuronal identity, comparing results across eight technically and biologically diverse datasets to define best practices for more complex assessments. We then apply this to novel interneuron subtypes, finding that 24/45 subtypes have evidence of replication, which enables the identification of robust candidate marker genes. Across tasks we find that large sets of variably expressed genes can identify replicable cell types with high accuracy, suggesting a general route forward for large-scale evaluation of scRNA-seq data.Single cell RNA-sequencing analysis poses challenges in replication due to technical biases and analytic variability among bioinformatics pipelines. Here, Crow et al develop MetaNeighbor for measuring cell-type replication across datasets, and use it to identify marker genes for neuron subtypes with evidence of replication.

[1]  R. Tibshirani,et al.  Repeated observation of breast tumor subtypes in independent gene expression data sets , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[2]  R. Irizarry,et al.  Missing data and technical variability in single‐cell RNA‐sequencing experiments , 2018, Biostatistics.

[3]  Christoph Bock,et al.  Single‐cell transcriptomes reveal characteristic features of human pancreatic islet cell types , 2015, EMBO reports.

[4]  Z. Bar-Joseph,et al.  Using neural networks for reducing the dimensions of single-cell RNA-Seq data , 2017, Nucleic acids research.

[5]  Fabian J Theis,et al.  The Human Cell Atlas , 2017, bioRxiv.

[6]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[7]  Jason Tucciarone,et al.  Strategies and Tools for Combinatorial Targeting of GABAergic Neurons in Mouse Cerebral Cortex , 2016, Neuron.

[8]  Yuval Kluger,et al.  Lineage specificity of gene expression patterns. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Lan Bao,et al.  Somatosensory neuron types identified by high-coverage single-cell RNA-sequencing and functional heterogeneity , 2016, Cell Research.

[10]  D. Craig,et al.  Transcriptomics , 2020, Nature Biotechnology.

[11]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.

[12]  Aleksandra A. Kolodziejczyk,et al.  The technology and biology of single-cell RNA sequencing. , 2015, Molecular cell.

[13]  J. Schug,et al.  Single-Cell Transcriptomics of the Human Endocrine Pancreas , 2016, Diabetes.

[14]  Cynthia C. Hession,et al.  Div-Seq: Single-nucleus RNA-Seq reveals dynamics of rare adult newborn neurons , 2016, Science.

[15]  Leopold Parts,et al.  A Bayesian Framework to Account for Complex Non-Genetic Factors in Gene Expression Levels Greatly Increases Power in eQTL Studies , 2010, PLoS Comput. Biol..

[16]  Evan Z. Macosko,et al.  Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics , 2016, Cell.

[17]  Yuchio Yanagawa,et al.  Integration of electrophysiological recordings with single-cell RNA-seq data identifies novel neuronal subtypes , 2015, Nature Biotechnology.

[18]  John C. Marioni,et al.  Correcting batch effects in single-cell RNA sequencing data by matching mutual nearest neighbours , 2017, bioRxiv.

[19]  Z. J. Huang,et al.  Transcriptional Architecture of Synaptic Communication Delineates GABAergic Neuron Identity , 2017, Cell.

[20]  Catalina A. Vallejos,et al.  BASiCS: Bayesian Analysis of Single-Cell Sequencing Data , 2015, PLoS Comput. Biol..

[21]  John C. Marioni,et al.  Additional file 1 of Pooling across cells to normalize single-cell RNA sequencing data with many zero counts , 2016 .

[22]  Matt Thomson,et al.  Low Dimensionality in Gene Expression Data Enables the Accurate Extraction of Transcriptional Programs from Shallow Sequencing. , 2016, Cell systems.

[23]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[24]  Patrick F Sullivan,et al.  The Psychiatric GWAS Consortium: Big Science Comes to Psychiatry , 2010, Neuron.

[25]  A. Murphy,et al.  RNA Sequencing of Single Human Islet Cells Reveals Type 2 Diabetes Genes. , 2016, Cell metabolism.

[26]  Roland Eils,et al.  Cross-platform analysis of cancer microarray data improves gene expression based classification of phenotypes , 2005, BMC Bioinformatics.

[27]  Aleksandra A. Kolodziejczyk,et al.  Accounting for technical noise in single-cell RNA-seq experiments , 2013, Nature Methods.

[28]  S. Linnarsson,et al.  Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing , 2014, Nature Neuroscience.

[29]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[30]  E. Marcotte,et al.  Rational association of genes with traits using a genome-scale gene network for Arabidopsis thaliana , 2010, Nature Biotechnology.

[31]  Alexander J. Hartemink,et al.  MATCHER: manifold alignment reveals correspondence between single cell transcriptome and epigenome dynamics , 2017, Genome Biology.

[32]  E. P. Gardner,et al.  Petilla terminology: nomenclature of features of GABAergic interneurons of the cerebral cortex , 2008, Nature Reviews Neuroscience.

[33]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[34]  Emma Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[35]  R. Fisher Statistical methods for research workers , 1927, Protoplasma.

[36]  David Venet,et al.  Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome , 2011, PLoS Comput. Biol..

[37]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[38]  Hannah Dueck,et al.  Deep sequencing reveals cell-type-specific patterns of single-cell transcriptome variation , 2015, Genome Biology.

[39]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[40]  Mauro J. Muraro,et al.  A Single-Cell Transcriptome Atlas of the Human Pancreas , 2016, Cell systems.

[41]  Staci A. Sorensen,et al.  Adult Mouse Cortical Cell Taxonomy Revealed by Single Cell Transcriptomics , 2016 .

[42]  A. Regev,et al.  Revealing the vectors of cellular identity with single-cell genomics , 2016, Nature Biotechnology.

[43]  Andrew Butler,et al.  Integrated analysis of single cell transcriptomic data across conditions, technologies, and species , 2017, bioRxiv.

[44]  E. Hovig,et al.  Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses , 2015, Biostatistics.

[45]  Lior Pachter,et al.  Single-cell transcriptomics reveals receptor transformations during olfactory neurogenesis , 2015, Science.

[46]  Maqc Consortium The MicroArray Quality Control ( MAQC )-II study of common practices for the development and validation of microarray-based predictive models , 2012 .

[47]  Sara Ballouz,et al.  Guidance for RNA-seq co-expression network construction and analysis: safety in numbers , 2015, Bioinform..

[48]  Lars E. Borm,et al.  Molecular Diversity of Midbrain Development in Mouse, Human, and Stem Cells , 2016, Cell.

[49]  Jens Hjerling-Leffler,et al.  Disentangling neural cell diversity using single-cell transcriptomics , 2016, Nature Neuroscience.

[50]  Michael I. Jordan,et al.  A critical assessment of Mus musculus gene function prediction using integrated genomic evidence , 2008, Genome Biology.

[51]  Martin Hemberg,et al.  scmap - A tool for unsupervised projection of single cell RNA-seq data , 2017, bioRxiv.

[52]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[53]  Amy V Kapp,et al.  Discovery and validation of breast cancer subtypes , 2006, BMC Genomics.

[54]  Sara Ballouz,et al.  EGAD: Ultra-fast functional analysis of gene networks , 2016, bioRxiv.

[55]  Evan Z. Macosko,et al.  A Molecular Census of Arcuate Hypothalamus and Median Eminence Cell Types , 2017, Nature Neuroscience.

[56]  D. M. Smith,et al.  Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes , 2016, Cell metabolism.

[57]  Amy V Kapp,et al.  Are clusters found in one dataset present in another dataset? , 2007, Biostatistics.

[58]  Yu-Jin Jung,et al.  Identification of Distinct Tumor Subpopulations in Lung Adenocarcinoma via Single-Cell RNA-seq , 2015, PloS one.

[59]  Hans Clevers,et al.  Single-cell messenger RNA sequencing reveals rare intestinal cell types , 2015, Nature.

[60]  Spyros Darmanis,et al.  Single-cell RNAseq reveals cell adhesion molecule profiles in electrophysiologically defined neurons , 2016, Proceedings of the National Academy of Sciences.