scAlign: a tool for alignment, integration, and rare cell identification from scRNA-seq data

scRNA-seq dataset integration occurs in different contexts, such as the identification of cell type-specific differences in gene expression across conditions or species, or batch effect correction. We present scAlign, an unsupervised deep learning method for data integration that can incorporate partial, overlapping or a complete set of cell labels, and estimate per-cell differences in gene expression across datasets. scAlign performance is state-of-the-art and robust to cross-dataset variation in cell type-specific expression and cell type composition. We demonstrate that scAlign identifies a rare cell population likely to drive malaria transmission. Our framework is widely applicable to integration challenges in other domains.

[1]  Daniel Cremers,et al.  Learning by Association — A Versatile Semi-Supervised Training Method for Neural Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[3]  Bryan D. Bryson,et al.  Panoramic stitching of heterogeneous single-cell transcriptomic data , 2018, bioRxiv.

[4]  J. Marioni,et al.  Multi-Omics factor analysis disentangles heterogeneity in blood cancer , 2017, bioRxiv.

[5]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[6]  Benjamin A. Logsdon,et al.  Gene Expression Elucidates Functional Impact of Polygenic Risk for Schizophrenia , 2016, Nature Neuroscience.

[7]  J. George,et al.  Single-cell transcriptomes identify human islet cell signatures and reveal cell-type–specific expression changes in type 2 diabetes , 2017, Genome research.

[8]  Shila Ghazanfar,et al.  scMerge: Integration of multiple single-cell transcriptomics datasets leveraging stable expression and pseudo-replication , 2018, bioRxiv.

[9]  D. Dubnau,et al.  Noise in Gene Expression Determines Cell Fate in Bacillus subtilis , 2007, Science.

[10]  Jay W. Shin,et al.  The Human Cell Atlas: Technical approaches and challenges , 2017, Briefings in functional genomics.

[11]  Li Qian,et al.  SLICER: inferring branched, nonlinear cellular trajectories from single cell RNA-seq data , 2016, Genome Biology.

[12]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[13]  Angela N. Brooks,et al.  A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles , 2017, Cell.

[14]  Sean C. Bendall,et al.  Wishbone identifies bifurcating developmental trajectories from single-cell data , 2016, Nature Biotechnology.

[15]  M. Elowitz,et al.  Functional roles for noise in genetic circuits , 2010, Nature.

[16]  James T. Webber,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018, Nature.

[17]  I. Amit,et al.  Dissecting Immune Circuits by Linking CRISPR-Pooled Screens with Single-Cell RNA-Seq , 2016, Cell.

[18]  Hongkai Ji,et al.  TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis , 2016, Nucleic acids research.

[19]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[20]  O. Elemento,et al.  Single-cell RNA sequencing reveals a signature of sexual commitment in malaria parasites , 2017, Nature.

[21]  Jeong Eon Lee,et al.  Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer , 2017, Nature Communications.

[22]  H. Stunnenberg,et al.  A Central Role for P48/45 in Malaria Parasite Male Gamete Fertility , 2001, Cell.

[23]  R. Irizarry,et al.  Missing data and technical variability in single‐cell RNA‐sequencing experiments , 2018, Biostatistics.

[24]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[25]  M. Hemberg,et al.  scmap: projection of single-cell RNA-seq data across data sets , 2018, Nature Methods.

[26]  J. Nichols,et al.  Single-cell transcriptome analysis of human, marmoset and mouse embryos reveals common and divergent features of preimplantation development , 2018 .

[27]  Principal Investigators,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018 .

[28]  Andrew McDavid,et al.  Data exploration, quality control and testing in single-cell qPCR-based gene expression experiments , 2012, Bioinform..

[29]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[30]  R. Satija,et al.  Integrative single-cell analysis , 2019, Nature Reviews Genetics.

[31]  André F. Rendeiro,et al.  Pooled CRISPR screening with single-cell transcriptome read-out , 2017, Nature Methods.

[32]  Rona S. Gertner,et al.  Single cell RNA Seq reveals dynamic paracrine control of cellular variation , 2014, Nature.

[33]  Wolfgang Huber,et al.  Multi-Omics factor analysis - a framework for unsupervised integration of multi-omic data sets , 2018 .

[34]  Christof Koch,et al.  Conserved cell types with divergent features between human and mouse cortex , 2018, bioRxiv.

[35]  Luyi Tian,et al.  scRNA-seq mixology: towards better benchmarking of single cell RNA-seq protocols and analysis methods , 2018, bioRxiv.

[36]  Kerstin B. Meyer,et al.  Single-cell reconstruction of the early maternal–fetal interface in humans , 2018, Nature.

[37]  Juan Carlos Fernández,et al.  Multiobjective evolutionary algorithms to identify highly autocorrelated areas: the case of spatial distribution in financially compromised farms , 2014, Ann. Oper. Res..

[38]  Shila Ghazanfar,et al.  scMerge leverages factor analysis, stable expression, and pseudoreplication to merge multiple single-cell RNA-seq datasets , 2019, Proceedings of the National Academy of Sciences.

[39]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[40]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[41]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[42]  R. Carter,et al.  Gene inactivation of Pf11‐1 of Plasmodium falciparum by chromosome breakage and healing: identification of a gametocyte‐specific protein with a potential role in gametogenesis. , 1992, The EMBO journal.

[43]  Allon M. Klein,et al.  A single cell atlas of the tracheal epithelium reveals the CFTR-rich pulmonary ionocyte , 2018, Nature.

[44]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[45]  Daniel Cremers,et al.  Associative Domain Adaptation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[46]  Monika S. Kowalczyk,et al.  Single-cell RNA-seq reveals changes in cell cycle and differentiation programs upon aging of hematopoietic stem cells , 2015, Genome research.

[47]  Sarah A Teichmann,et al.  A test metric for assessing single-cell RNA-seq batch correction , 2018, Nature Methods.

[48]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[49]  Stéphanie Bougeard,et al.  MINT: a multivariate integrative method to identify reproducible molecular signatures across independent experiments and platforms , 2016, BMC Bioinformatics.

[50]  Nimrod D. Rubinstein,et al.  Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region , 2018, Science.

[51]  D. M. Smith,et al.  Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes , 2016, Cell metabolism.

[52]  Heterogeneous Responses of Hematopoietic Stem Cells to Inflammatory Stimuli are Altered with Age , 2017 .

[53]  W. Huber,et al.  Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2 , 2014, Genome Biology.

[54]  Cole Trapnell,et al.  Defining cell types and states with single-cell genomics , 2015, Genome research.

[55]  M. Llinás,et al.  Regulation of sexual differentiation is linked to invasion in malaria parasites , 2019, bioRxiv.

[56]  Hannah A. Pliner,et al.  Reversed graph embedding resolves complex single-cell trajectories , 2017, Nature Methods.

[57]  O. Elemento,et al.  Revisiting the initial steps of sexual development in the malaria parasite Plasmodium falciparum , 2018, Nature Microbiology.

[58]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[59]  Mauro J. Muraro,et al.  A Single-Cell Transcriptome Atlas of the Human Pancreas , 2016, Cell systems.

[60]  Thomas M. Norman,et al.  Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens , 2016, Cell.

[61]  K. Williamson,et al.  Transposon mutagenesis identifies genes essential for Plasmodium falciparum gametocytogenesis , 2013, Proceedings of the National Academy of Sciences.