INSCT: Integrating millions of single cells using batch-aware triplet neural networks

Efficient integration of heterogeneous and increasingly large single cell RNA sequencing (scRNA-seq) data poses a major challenge for analysis and in particular, comprehensive atlasing efforts. Here, we developed a novel deep learning algorithm to overcome batch effects using batch-aware triplet neural networks, called INSCT (“Insight”). Using simulated and real data, we demonstrate that INSCT generates an embedding space which accurately integrates cells across experiments, platforms and species. Our benchmark comparisons with current state-of-the-art scRNA-seq integration methods revealed that INSCT outperforms competing methods in scalability while achieving comparable accuracies. Moreover, using INSCT in semi-supervised mode enables users to classify unlabeled cells by projecting them into a reference collection of annotated cells. To demonstrate scalability, we applied INSCT to integrate more than 2.6 million transcriptomes from four independent studies of mouse brains in less than 1.5 hours using less than 25 gigabytes of memory. This feature empowers researchers to perform atlasing scale data integration in a typical desktop computer environment. INSCT is freely available at https://github.com/lkmklsmn/insct. Highlights INSCT accurately integrates multiple scRNA-seq datasets INSCT accurately predicts cell types for an independent scRNA-seq dataset Efficient deep learning framework enables integration of millions of cells on a personal computer

[1]  Fabian J Theis,et al.  Single cells make big data: New challenges and opportunities in transcriptomics , 2017 .

[2]  Fabian J Theis,et al.  Deep learning: new computational modelling techniques for genomics , 2019, Nature Reviews Genetics.

[3]  Nir Ailon,et al.  Deep Metric Learning Using Triplet Network , 2014, SIMBAD.

[4]  Fabian J Theis,et al.  An atlas of the aging lung mapped by single cell transcriptomics and deep tissue proteomics , 2018, Nature Communications.

[5]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[6]  Raffaella Casadei,et al.  An estimation of the number of cells in the human body , 2013, Annals of human biology.

[7]  Fan Zhang,et al.  Fast, sensitive, and accurate integration of single cell data with Harmony , 2018, bioRxiv.

[8]  Kok Siong Ang,et al.  A benchmark of batch-effect correction methods for single-cell RNA sequencing data , 2020, Genome Biology.

[9]  A. Álvarez-Buylla,et al.  Neural stem cells: origin, heterogeneity and regulation in the adult mammalian brain , 2019, Development.

[10]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[11]  Fabian J Theis,et al.  The Human Cell Atlas , 2017, bioRxiv.

[12]  Richard A. Muscat,et al.  Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding , 2018, Science.

[13]  Principal Investigators,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018 .

[14]  Mohammad Lotfollahi,et al.  scGen predicts single-cell perturbation responses , 2019, Nature Methods.

[15]  Kerstin B. Meyer,et al.  Fast Batch Alignment of Single Cell Transcriptomes Unifies Multiple Mouse Cell Atlases into an Integrated Landscape , 2018, bioRxiv.

[16]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[17]  Rhonda Bacher,et al.  Design and computational analysis of single-cell RNA-sequencing experiments , 2016, Genome Biology.

[18]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[19]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[20]  James T. Webber,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018, Nature.

[21]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.

[22]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.

[23]  Mauro J. Muraro,et al.  A Single-Cell Transcriptome Atlas of the Human Pancreas , 2016, Cell systems.

[24]  Fabian J Theis,et al.  Single-cell RNA-seq denoising using a deep count autoencoder , 2019, Nature Communications.

[25]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[27]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[28]  J. Schug,et al.  Single-Cell Transcriptomics of the Human Endocrine Pancreas , 2016, Diabetes.

[29]  Bonnie Berger,et al.  Efficient integration of heterogeneous single-cell transcriptomes using Scanorama , 2019, Nature Biotechnology.

[31]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[32]  D. M. Smith,et al.  Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes , 2016, Cell metabolism.

[33]  Jeff Heaton,et al.  Ian Goodfellow, Yoshua Bengio, and Aaron Courville: Deep learning , 2017, Genetic Programming and Evolvable Machines.

[34]  Benjamin Szubert,et al.  Structure-preserving visualisation of high dimensional single-cell datasets , 2019, Scientific Reports.