Integrating single-cell transcriptomic data across different conditions, technologies, and species

Computational single-cell RNA-seq (scRNA-seq) methods have been successfully applied to experiments representing a single condition, technology, or species to discover and define cellular phenotypes. However, identifying subpopulations of cells that are present across multiple data sets remains challenging. Here, we introduce an analytical strategy for integrating scRNA-seq data sets based on common sources of variation, enabling the identification of shared populations across data sets and downstream comparative analysis. We apply this approach, implemented in our R toolkit Seurat (http://satijalab.org/seurat/), to align scRNA-seq data sets of peripheral blood mononuclear cells under resting and stimulated conditions, hematopoietic progenitors sequenced using two profiling technologies, and pancreatic cell 'atlases' generated from human and mouse islets. In each case, we learn distinct or transitional cell states jointly across data sets, while boosting statistical power through integrated analysis. Our approach facilitates general comparisons of scRNA-seq data sets, potentially deepening our understanding of how distinct cell states respond to perturbation, disease, and evolution.

[1]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[2]  J. Kettenring,et al.  Canonical Analysis of Several Sets of Variables , 2022 .

[3]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[4]  Donald J. Berndt,et al.  Using Dynamic Time Warping to Find Patterns in Time Series , 1994, KDD Workshop.

[5]  angesichts der Corona-Pandemie,et al.  UPDATE , 1973, The Lancet.

[6]  S. Dudoit,et al.  Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data , 2002 .

[7]  Allan Aasbjerg Nielsen,et al.  Multiset canonical correlations analysis and multispectral, truly multitemporal remote sensing data , 2002, IEEE Trans. Image Process..

[8]  Trevor Hastie,et al.  Class Prediction by Nearest Shrunken Centroids, with Applications to DNA Microarrays , 2003 .

[9]  Tsonwin Hai,et al.  Activating Transcription Factor 3 Is Integral to the Eukaryotic Initiation Factor 2 Kinase Stress Response , 2004, Molecular and Cellular Biology.

[10]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[11]  Lina A. Thoren,et al.  Identification of Flt3+ Lympho-Myeloid Stem Cells Lacking Erythro-Megakaryocytic Potential A Revised Road Map for Adult Blood Lineage Commitment , 2005, Cell.

[12]  Lothar Reichel,et al.  Augmented Implicitly Restarted Lanczos Bidiagonalization Methods , 2005, SIAM J. Sci. Comput..

[13]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[14]  T. Hastie,et al.  Principal Curves , 2007 .

[15]  Ran He,et al.  Face shape recovery from a single image using CCA mapping between tensor spaces , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  A. Zwinderman,et al.  Statistical Applications in Genetics and Molecular Biology Quantifying the Association between Gene Expressions and DNA-Markers by Penalized Canonical Correlation Analysis , 2011 .

[17]  D. Scheuner,et al.  The unfolded protein response: a pathway that links insulin demand with beta-cell failure and diabetes. , 2008, Endocrine reviews.

[18]  Philippe Besse,et al.  Sparse canonical methods for biological data integration: application to a cross-platform study , 2009, BMC Bioinformatics.

[19]  E. Andrès,et al.  CD56bright natural killer (NK) cells: an important NK cell subset , 2009, Immunology.

[20]  Toni Giorgino,et al.  Computing and Visualizing Dynamic Time Warping Alignments in R: The dtw Package , 2009 .

[21]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[22]  Junping Zhang,et al.  Super-resolution of human face image using canonical correlation analysis , 2010, Pattern Recognit..

[23]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for GWAS meta-analysis , 2012, Nucleic acids research.

[24]  H. Wu,et al.  Thermal transport through a one-dimensional quantum spin-1/2 chain heterostructure: The role of three-site spin interaction , 2012 .

[25]  F. R. Papa,et al.  Endoplasmic reticulum stress, pancreatic β-cell degeneration, and diabetes. , 2012, Cold Spring Harbor perspectives in medicine.

[26]  G. Tseng,et al.  Comprehensive literature review and statistical considerations for microarray meta-analysis , 2012, Nucleic acids research.

[27]  Ludo Waltman,et al.  A smart local moving algorithm for large-scale modularity-based community detection , 2013, The European Physical Journal B.

[28]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[29]  Sean C. Bendall,et al.  Single-Cell Trajectory Detection Uncovers Progression and Regulatory Coordination in Human B Cell Development , 2014, Cell.

[30]  Laurens van der Maaten,et al.  Accelerating t-SNE using tree-based algorithms , 2014, J. Mach. Learn. Res..

[31]  George M. Church,et al.  Highly Multiplexed Subcellular RNA Sequencing in Situ , 2014, Science.

[32]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[33]  Jüri Lember,et al.  Bridging Viterbi and posterior decoding: a generalized risk approach to hidden path inference based on hidden Markov models , 2014, J. Mach. Learn. Res..

[34]  Åsa K. Björklund,et al.  Full-length RNA-seq from single cells using Smart-seq2 , 2014, Nature Protocols.

[35]  A. Oudenaarden,et al.  Genome-wide RNA Tomography in the Zebrafish Embryo , 2014, Cell.

[36]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[37]  Matthew E. Ritchie,et al.  limma powers differential expression analyses for RNA-sequencing and microarray studies , 2015, Nucleic acids research.

[38]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[39]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[40]  Fátima Sánchez-Cabo,et al.  GOplot: an R package for visually combining expression data with functional analysis , 2015, Bioinform..

[41]  Kutay D Atabay,et al.  Single-cell analysis reveals transcriptional heterogeneity of neural progenitors in human cortex , 2015, Nature Neuroscience.

[42]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[43]  I. Amit,et al.  Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors , 2015, Cell.

[44]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression data , 2015 .

[45]  Kay Elder,et al.  Defining the three cell lineages of the human blastocyst by single-cell RNA-seq , 2015, Development.

[46]  J. Marioni,et al.  High-throughput spatial mapping of single-cell RNA-seq data to tissue of origin , 2015, Nature Biotechnology.

[47]  J. Seidman,et al.  Single-Cell Resolution of Temporal Gene Expression during Heart Development. , 2016, Developmental cell.

[48]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.

[49]  Patrik L. Ståhl,et al.  Visualization and analysis of gene expression in tissue sections by spatial transcriptomics , 2016, Science.

[50]  Daniel J. Gaffney,et al.  A survey of best practices for RNA-seq data analysis , 2016, Genome Biology.

[51]  Li Qian,et al.  SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data , 2016, Genome Biology.

[52]  Lior Pachter,et al.  Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts , 2016, Genome Biology.

[53]  Sandhya Prabhakaran,et al.  Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data , 2016, ICML.

[54]  M. Ronaghi,et al.  Neuronal subtypes and diversity revealed by single-nucleus RNA sequencing of the human brain , 2016, Science.

[55]  Evan Z. Macosko,et al.  Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics , 2016, Cell.

[56]  Nicola K. Wilson,et al.  Resolving Early Mesoderm Diversification through Single Cell Expression Profiling , 2016, Nature.

[57]  Andrew D. Rouillard,et al.  Enrichr: a comprehensive gene set enrichment analysis web server 2016 update , 2016, Nucleic Acids Res..

[58]  Nicola K. Wilson,et al.  A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. , 2016, Blood.

[59]  Charles H. Yoon,et al.  Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq , 2016, Science.

[60]  Aleksandra A. Kolodziejczyk,et al.  Classification of low quality cells from single-cell RNA-seq data , 2016, Genome Biology.

[61]  Sara B. Linker,et al.  Nuclear RNA-seq of single neurons reveals molecular signatures of activation , 2016, Nature Communications.

[62]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[63]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[64]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[65]  Valentine Svensson,et al.  Power Analysis of Single Cell RNA-Sequencing Experiments , 2016, Nature Methods.

[66]  Fabian J Theis,et al.  The Human Cell Atlas , 2017, bioRxiv.

[67]  N. Hacohen,et al.  Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progenitors , 2017, Science.

[68]  Joshua W. K. Ho,et al.  CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data , 2016, Genome Biology.

[69]  I. Hellmann,et al.  Comparative Analysis of Single-Cell RNA Sequencing Methods , 2016, bioRxiv.

[70]  Alex Costa,et al.  ATP sensing in living plant cells reveals tissue gradients and stress dynamics of energy physiology , 2017, bioRxiv.

[71]  Allon M. Klein,et al.  Single-cell barcoding and sequencing using droplet microfluidics , 2016, Nature Protocols.

[72]  Christoph Hafemeister,et al.  Developmental diversification of cortical inhibitory interneurons , 2017, Nature.

[73]  Chun Jimmie Ye,et al.  Multiplexed droplet single-cell RNA-sequencing using natural genetic variation , 2017, Nature Biotechnology.