Comprehensive integration of single cell data

Single cell transcriptomics (scRNA-seq) has transformed our ability to discover and annotate cell types and states, but deep biological understanding requires more than a taxonomic listing of clusters. As new methods arise to measure distinct cellular modalities, including high-dimensional immunophenotypes, chromatin accessibility, and spatial positioning, a key analytical challenge is to integrate these datasets into a harmonized atlas that can be used to better understand cellular identity and function. Here, we develop a computational strategy to “anchor” diverse datasets together, enabling us to integrate and compare single cell measurements not only across scRNA-seq technologies, but different modalities as well. After demonstrating substantial improvement over existing methods for data integration, we anchor scRNA-seq experiments with scATAC-seq datasets to explore chromatin differences in closely related interneuron subsets, and project single cell protein measurements onto a human bone marrow atlas to annotate and characterize lymphocyte populations. Lastly, we demonstrate how anchoring can harmonize in-situ gene expression and scRNA-seq datasets, allowing for the transcriptome-wide imputation of spatial gene expression patterns, and the identification of spatial relationships between mapped cell types in the visual cortex. Our work presents a strategy for comprehensive integration of single cell data, including the assembly of harmonized references, and the transfer of information across datasets. Availability: Installation instructions, documentation, and tutorials are available at: https://www.satijalab.org/seurat

[1]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[2]  R. Testi,et al.  T cell activation via Leu-23 (CD69). , 1989, Journal of immunology.

[3]  Sunil Arya,et al.  ANN: library for approximate nearest neighbor searching , 1998 .

[4]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[5]  C. Shatz,et al.  Expression of T cell receptor beta locus in central nervous system neurons. , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[7]  C. Shatz,et al.  Expression of T cell receptor βlocus in central nervous system neurons , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  John D. Storey,et al.  Statistical significance for genomewide studies , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[9]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[10]  L. Miraglia,et al.  A Functional Genomics Strategy Reveals Rora as a Component of the Mammalian Circadian Clock , 2004, Neuron.

[11]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[12]  R. Doolittle,et al.  Progressive sequence alignment as a prerequisitetto correct phylogenetic trees , 2007, Journal of Molecular Evolution.

[13]  Robert Gentleman,et al.  Using GOstats to test gene lists for GO term association , 2007, Bioinform..

[14]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[15]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[16]  Cole Trapnell,et al.  Ultrafast and memory-efficient alignment of short DNA sequences to the human genome , 2009, Genome Biology.

[17]  D. Hunter,et al.  mixtools: An R Package for Analyzing Mixture Models , 2009 .

[18]  David R. Hunter,et al.  mixtools: An R Package for Analyzing Mixture Models , 2009 .

[19]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[20]  Lior Pachter,et al.  Sequence Analysis , 2020, Definitions.

[21]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[22]  C. Glass,et al.  Simple combinations of lineage-determining transcription factors prime cis-regulatory elements required for macrophage and B cell identities. , 2010, Molecular cell.

[23]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[24]  Huanming Yang,et al.  De novo assembly of human genomes with massively parallel short read sequencing. , 2010, Genome research.

[25]  J. Troge,et al.  Tumour evolution inferred by single-cell sequencing , 2011, Nature.

[26]  Chang Wang,et al.  Heterogeneous Domain Adaptation Using Manifold Alignment , 2011, IJCAI.

[27]  Thomas R. Gingeras,et al.  STAR: ultrafast universal RNA-seq aligner , 2013, Bioinform..

[28]  A. Tanay,et al.  Single cell Hi-C reveals cell-to-cell variability in chromosome structure , 2013, Nature.

[29]  Sean C. Bendall,et al.  Single-Cell Trajectory Detection Uncovers Progression and Regulatory Coordination in Human B Cell Development , 2014, Cell.

[30]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[31]  A. M. de Bruin,et al.  Impact of interferon-γ on hematopoiesis. , 2014, Blood.

[32]  Z. Fang,et al.  Human memory T cells from the bone marrow are resting and maintain long-lasting systemic memory , 2014, Proceedings of the National Academy of Sciences.

[33]  Charles C Lee,et al.  Differential expression of mGluR2 in the developing cerebral cortex of the mouse. , 2014, Journal of biomedical science and engineering.

[34]  N. Friedman,et al.  Chromatin state dynamics during blood formation , 2014, Science.

[35]  J. O’Shea,et al.  Faculty Opinions recommendation of Immunogenetics. Chromatin state dynamics during blood formation. , 2014 .

[36]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression , 2015, Nature Biotechnology.

[37]  Sean C. Bendall,et al.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis , 2015, Cell.

[38]  R. Youngs,et al.  Learning from the past and predicting the future , 2015, The Journal of Laryngology & Otology.

[39]  J. Marioni,et al.  High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin , 2015, Nature Biotechnology.

[40]  Sean Davis,et al.  Statistical Genomics. Methods and Protocols. , 2016, Anticancer research.

[41]  Thomas M. Norman,et al.  Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens , 2016, Cell.

[42]  Wenjun Sun,et al.  Induction Motor Fault Diagnosis Based on Deep Neural Network of Sparse Auto-encoder , 2016 .

[43]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.

[44]  Howard Y. Chang,et al.  Lineage-specific and single cell chromatin accessibility charts human hematopoiesis and leukemia evolution , 2016, Nature Genetics.

[45]  Mauro J. Muraro,et al.  De Novo Prediction of Stem Cell Identity using Single-Cell Transcriptome Data , 2016, Cell stem cell.

[46]  Fidel Ramírez,et al.  deepTools2: a next generation web server for deep-sequencing data analysis , 2016, Nucleic Acids Res..

[47]  Davis J. McCarthy,et al.  A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor , 2016, F1000Research.

[48]  Christof Koch,et al.  Adult Mouse Cortical Cell Taxonomy by Single Cell Transcriptomics , 2016, Nature Neuroscience.

[49]  Evan Z. Macosko,et al.  Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics , 2016, Cell.

[50]  Mauro J. Muraro,et al.  A Single-Cell Transcriptome Atlas of the Human Pancreas , 2016, Cell systems.

[51]  A. Bhardwaj,et al.  In situ click chemistry generation of cyclooxygenase-2 inhibitors , 2017, Nature Communications.

[52]  Cole Trapnell,et al.  Faculty Opinions recommendation of High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin. , 2016 .

[53]  D. M. Smith,et al.  Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes , 2016, Cell metabolism.

[54]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[55]  S. Quake,et al.  Transcriptomic characterization of 20 organs and tissues from mouse at single cell resolution creates a Tabula Muris , 2017, bioRxiv.

[56]  Shiwei Zheng,et al.  Cell “hashing” with barcoded antibodies enables multiplexing and doublet detection for single cell genomics , 2017, bioRxiv.

[57]  Andrew C. Adey,et al.  Sequencing thousands of single-cell genomes with combinatorial indexing , 2017, Nature Methods.

[58]  O. Stegle,et al.  Single-cell epigenomics: Recording the past and predicting the future , 2017, Science.

[59]  Hannah A. Pliner,et al.  Reversed graph embedding resolves complex single-cell trajectories , 2017, Nature Methods.

[60]  Fabian J. Theis,et al.  Assessment of batch-correction methods for scRNA-seq data with a new test metric , 2017, bioRxiv.

[61]  A. Regev,et al.  Scaling single-cell genomics from phenomenology to mechanism , 2017, Nature.

[62]  J. George,et al.  Single-cell transcriptomes identify human islet cell signatures and reveal cell-type–specific expression changes in type 2 diabetes , 2017, Genome research.

[63]  Allan R. Jones,et al.  Shared and distinct transcriptomic cell types across neocortical areas , 2017, bioRxiv.

[64]  James T. Webber,et al.  Single-cell transcriptomic characterization of 20 organs and tissues from individual mice creates a Tabula Muris , 2017 .

[65]  Valentine Svensson,et al.  Power Analysis of Single Cell RNA-Sequencing Experiments , 2016, Nature Methods.

[66]  I. Amit,et al.  Single-cell spatial reconstruction reveals global division of labor in the mammalian liver , 2016, Nature.

[67]  Fabian J Theis,et al.  The Human Cell Atlas , 2017, bioRxiv.

[68]  Nikolaus Rajewsky,et al.  The Drosophila embryo at single-cell transcriptome resolution , 2017, Science.

[69]  William Stafford Noble,et al.  Massively multiplex single-cell Hi-C , 2016, Nature Methods.

[70]  Schraga Schwartz,et al.  The m1A landscape on cytosolic and mitochondrial mRNA at single-base resolution , 2017, Nature.

[71]  Vanessa M. Peterson,et al.  Multiplexed quantification of proteins and transcripts in single cells , 2017, Nature Biotechnology.

[72]  Andrew C. Adey,et al.  Cicero Predicts cis-Regulatory DNA Interactions from Single-Cell Chromatin Accessibility Data. , 2018, Molecular cell.

[73]  L. Pachter,et al.  Identification of transcriptional signatures for cell types from single-cell RNA-Seq , 2018, bioRxiv.

[74]  Sydney M. Shaffer,et al.  Rare Cell Detection by Single-Cell RNA Sequencing as Guided by Single-Molecule RNA FISH. , 2018, Cell systems.

[75]  James A. Gagnon,et al.  Simultaneous single-cell profiling of lineages and cell types in the vertebrate brain , 2018, Nature Biotechnology.

[76]  S. Teichmann,et al.  Exponential scaling of single-cell RNA-seq in the past decade , 2017, Nature Protocols.

[77]  P. Kharchenko,et al.  Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain , 2017, Nature Biotechnology.

[78]  Principal Investigators,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018 .

[79]  D. Dickel,et al.  Single-nucleus analysis of accessible chromatin in developing mouse forebrain reveals cell-type-specific transcriptional regulation , 2018, Nature Neuroscience.

[80]  Salil S. Bhate,et al.  Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging , 2017, Cell.

[81]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[82]  Evan Z. Macosko,et al.  Molecular Diversity and Specializations among the Cells of the Adult Mouse Brain , 2018, Cell.

[83]  Bryan D. Bryson,et al.  Panoramic stitching of heterogeneous single-cell transcriptomic data , 2018, bioRxiv.

[84]  Lars E. Borm,et al.  Spatial organization of the somatosensory cortex revealed by osmFISH , 2018, Nature Methods.

[85]  Nimrod D. Rubinstein,et al.  Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region , 2018, Science.

[86]  Lars E. Borm,et al.  Spatial organization of the somatosensory cortex revealed by cyclic smFISH , 2018, bioRxiv.

[87]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[88]  Satija Lab Barcoded Plate-Based Single Cell RNA-seq , 2018 .

[89]  Christoph Hafemeister,et al.  Developmental diversification of cortical inhibitory interneurons , 2017, Nature.

[90]  Sean C. Bendall,et al.  A Structured Tumor-Immune Microenvironment in Triple Negative Breast Cancer Revealed by Multiplexed Ion Beam Imaging , 2018, Cell.

[91]  Samuel L. Wolock,et al.  Scrublet: computational identification of cell doublets in single-cell transcriptomic data , 2018, bioRxiv.

[92]  J. Junker,et al.  Simultaneous lineage tracing and cell-type identification using CRISPR/Cas9-induced genetic scars , 2018, Nature Biotechnology.

[93]  Andrew C. Adey,et al.  Joint profiling of chromatin accessibility and gene expression in thousands of single cells , 2018, Science.

[94]  William E. Allen,et al.  Three-dimensional intact-tissue sequencing of single-cell transcriptional states , 2018, Science.

[95]  Erik Sundström,et al.  RNA velocity of single cells , 2018, Nature.

[96]  A. Oudenaarden,et al.  Whole-organism clone tracing using single-cell sequencing , 2018, Nature.

[97]  William S. DeWitt,et al.  A Single-Cell Atlas of In Vivo Mammalian Chromatin Accessibility , 2018, Cell.

[98]  Justin P Sandoval,et al.  Robust single-cell DNA methylome profiling with snmC-seq2 , 2018, Nature Communications.

[99]  Allan R. Jones,et al.  Shared and distinct transcriptomic cell types across neocortical areas , 2018, Nature.

[100]  M. Hemberg,et al.  scmap: projection of single-cell RNA-seq data across data sets , 2018, Nature Methods.

[101]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[102]  X. Zhuang,et al.  Multiplexed imaging of high-density libraries of RNAs with MERFISH and expansion microscopy , 2017, bioRxiv.

[103]  Carlo Colantuoni,et al.  Decomposing cell identity for transfer learning across cellular measurements, platforms, tissues, and species , 2018, bioRxiv.

[104]  Kevin R. Moon,et al.  Recovering Gene Interactions from Single-Cell Data Using Data Diffusion , 2018, Cell.

[105]  Allon M Klein,et al.  Scrublet: Computational Identification of Cell Doublets in Single-Cell Transcriptomic Data. , 2019, Cell systems.

[106]  R. Satija,et al.  Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression , 2019, Genome Biology.

[107]  R. Satija,et al.  Integrative single-cell analysis , 2019, Nature Reviews Genetics.

[108]  Bonnie Berger,et al.  Efficient integration of heterogeneous single-cell transcriptomes using Scanorama , 2019, Nature Biotechnology.

[109]  L. Pachter,et al.  A discriminative learning approach to differential expression analysis for single-cell RNA-seq , 2019, Nature Methods.

[110]  Evan Z. Macosko,et al.  Single-Cell Multi-omic Integration Compares and Contrasts Features of Brain Cell Identity , 2019, Cell.