JIND: joint integration and discrimination for automated single-cell annotation

Single-cell RNA-seq is a powerful tool in the study of the cellular composition of different tissues and organisms. A key step in the analysis pipeline is the annotation of cell-types based on the expression of specific marker genes. Since manual annotation is labor-intensive and does not scale to large datasets, several methods for automated cell-type annotation have been proposed based on supervised learning. However, these methods generally require feature extraction and batch alignment prior to classification, and their performance may become unreliable in the presence of cell-types with very similar transcriptomic profiles, such as differentiating cells. We propose JIND, a framework for automated cell-type identification based on neural networks that directly learns a low-dimensional representation (latent code) in which cell-types can be reliably determined. To account for batch effects, JIND performs a novel asymmetric alignment in which the transcriptomic profile of unseen cells is mapped onto the previously learned latent space, hence avoiding the need of retraining the model whenever a new dataset becomes available. JIND also learns cell-type-specific confidence thresholds to identify and reject cells that cannot be reliably classified. We show on datasets with and without batch effects that JIND classifies cells more accurately than previously proposed methods while rejecting only a small proportion of cells. Moreover, JIND batch alignment is parallelizable, being more than five or six times faster than Seurat integration. Availability: https://github.com/mohit1997/JIND.

[1]  D. Grün,et al.  Revealing routes of cellular differentiation by single-cell RNA-seq , 2018, Current Opinion in Systems Biology.

[2]  Astrid Gall,et al.  Ensembl 2020 , 2019, Nucleic Acids Res..

[3]  Fabian J Theis,et al.  The Human Cell Atlas , 2017, bioRxiv.

[4]  Michael Q. Zhang,et al.  SuperCT: a supervised-learning framework for enhanced characterization of single-cell transcriptomic profiles , 2019, Nucleic acids research.

[5]  Kerstin B. Meyer,et al.  Fast Batch Alignment of Single Cell Transcriptomes Unifies Multiple Mouse Cell Atlases into an Integrated Landscape , 2018, bioRxiv.

[6]  Marcel J. T. Reinders,et al.  A comparison of automatic cell identification methods for single-cell RNA sequencing data , 2019, Genome Biology.

[7]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[8]  Luc Bouwens,et al.  Adult human pancreatic acinar cells dedifferentiate into an embryonic progenitor-like state in 3D suspension culture , 2019, Scientific Reports.

[9]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[10]  Mingyao Li,et al.  Single-cell transcriptomics of the mouse kidney reveals potential cellular targets of kidney disease , 2018, Science.

[11]  Gregory R. Grant,et al.  Outlier detection for improved differential splicing quantification from RNA-Seq experiments with replicates , 2017, bioRxiv.

[12]  Bonnie Berger,et al.  Efficient integration of heterogeneous single-cell transcriptomes using Scanorama , 2019, Nature Biotechnology.

[13]  Charity W. Law,et al.  voom: precision weights unlock linear model analysis tools for RNA-seq read counts , 2014, Genome Biology.

[14]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[15]  Gary D Bader,et al.  Evaluation of methods to assign cell type labels to cell clusters from single-cell RNA-sequencing data , 2019, bioRxiv.

[16]  P. Reddien,et al.  Fundamentals of planarian regeneration. , 2004, Annual review of cell and developmental biology.

[17]  Gerald Quon,et al.  scAlign: a tool for alignment, integration, and rare cell identification from scRNA-seq data , 2018, Genome Biology.

[18]  Howard Y. Chang,et al.  Single-cell multiomic analysis identifies regulatory programs in mixed-phenotype acute leukemia , 2019, Nature Biotechnology.

[19]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[20]  Xiaofeng Liu,et al.  Confidence Regularized Self-Training , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[21]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[22]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[23]  Pascal Vincent,et al.  Unsupervised Feature Learning and Deep Learning: A Review and New Perspectives , 2012, ArXiv.

[24]  Principal Investigators,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018 .

[25]  David A. Knowles,et al.  Batch effects and the effective design of single-cell gene expression studies , 2016, Scientific Reports.

[26]  Mauro J. Muraro,et al.  A Single-Cell Transcriptome Atlas of the Human Pancreas , 2016, Cell systems.

[27]  Tianqi Chen,et al.  Empirical Evaluation of Rectified Activations in Convolutional Network , 2015, ArXiv.

[28]  Parul Parashar,et al.  Neural Networks in Machine Learning , 2014 .

[29]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[30]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[31]  Matteo Pellegrini,et al.  Automated identification of Cell Types in Single Cell RNA Sequencing , 2019, bioRxiv.

[32]  D. M. Smith,et al.  Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes , 2016, Cell metabolism.

[33]  S. Orkin,et al.  Mapping the Mouse Cell Atlas by Microwell-Seq , 2018, Cell.

[34]  Jayadeva,et al.  Discovery of rare cells from voluminous single cell expression data , 2018, Nature Communications.

[35]  M. Hemberg,et al.  Identifying cell populations with scRNASeq. , 2017, Molecular aspects of medicine.

[36]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[37]  Cole Trapnell,et al.  Defining cell types and states with single-cell genomics , 2015, Genome research.

[38]  M. Scharf,et al.  Rapid evolutionary responses to insecticide resistance management interventions by the German cockroach (Blattella germanica L.) , 2019, Scientific Reports.

[39]  James T. Webber,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018, Nature.

[40]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.

[41]  M. Hemberg,et al.  scmap: projection of single-cell RNA-seq data across data sets , 2018, Nature Methods.

[42]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[43]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[44]  Hanlee P. Ji,et al.  scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data , 2019, Genome Biology.

[45]  Yoshua Bengio,et al.  Generative Adversarial Networks , 2014, ArXiv.

[46]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  Chenwei Li,et al.  SciBet: a portable and fast single cell type identifier , 2019, bioRxiv.