scCapsNet: a deep learning classifier with the capability of interpretable feature extraction, applicable for single cell RNA data analysis

Recent advances in single cell RNA sequencing (scRNA-seq) call more computational analysis methods. As the data for non-characterized cells accumulates quickly, supervised learning model is an ideal tool to classify the non-characterized cells based on the previously well characterized cells. However, deep learning model is an appropriate tool to deal with vast and complex data such as RNA-seq data, but lacks of interpretability. Here for the first time, we present scCapsNet, a deep learning model adapted from CapsNet. The scCapsNet model retains the capsule parts of CapsNet and replaces the part of convolutional neural networks with several parallel fully connected neural networks. We apply scCapsNet to scRNA-seq data of mouse retinal bipolar cells and human peripheral blood mononuclear cells (PBMC). The results show that scCapsNet performs well as a classifier. Meanwhile, the results also demonstrate that the parallel fully connected neural networks function like feature detectors as we supposed. The scCapsNet model provides precise contribution of each extracted feature to the cell type recognition. Furthermore, we mix the RNA expression of two cells with different cell types and then use the scCapsNet model trained with non-mixed data to predict the cell types in the mixed data. Our scCapsNet model could predict cell types in a cell mixture with high accuracy.

[1]  Chao Fang,et al.  Improving Protein Gamma-Turn Prediction Using Inception Capsule Networks , 2018, Scientific Reports.

[2]  Anne Condon,et al.  Interpretable dimensionality reduction of single cell transcriptome data with deep generative models , 2017, Nature Communications.

[3]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[4]  Evan Z. Macosko,et al.  Comprehensive Classification of Retinal Bipolar Neurons by Single-Cell Transcriptomics , 2016, Cell.

[5]  I. Amit,et al.  Paired-cell sequencing enables spatial gene expression mapping of liver endothelial cells , 2018, Nature Biotechnology.

[6]  Boxi Kang,et al.  Landscape of Infiltrating T Cells in Liver Cancer Revealed by Single-Cell Sequencing , 2017, Cell.

[7]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[8]  Yufei Huang,et al.  GSAE: an autoencoder with embedded gene-set nodes for genomics functional characterization , 2018, BMC Systems Biology.

[9]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[10]  Casper Kaae Sønderby,et al.  scVAE: Variational auto-encoders for single-cell gene expression data , 2018, bioRxiv.

[11]  S. Orkin,et al.  Mapping the Mouse Cell Atlas by Microwell-Seq , 2018, Cell.

[12]  Nicholas M. Luscombe,et al.  Generative adversarial networks simulate gene expression and predict perturbations in single cells , 2018, bioRxiv.

[13]  Wilson Rivera,et al.  Capsule Networks for Protein Structure Classification and Prediction , 2018, ArXiv.

[14]  Diogo M. Camacho,et al.  Next-Generation Machine Learning for Biological Networks , 2018, Cell.

[15]  Hannah A. Pliner,et al.  Reversed graph embedding resolves complex single-cell trajectories , 2017, Nature Methods.

[16]  Nicholas M. Luscombe,et al.  Generative adversarial networks simulate gene expression and predict perturbations in single cells , 2018, bioRxiv.

[17]  Boxi Kang,et al.  Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing , 2018, Nature Medicine.

[18]  Chandra L. Theesfeld,et al.  Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk , 2018, Nature Genetics.

[19]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[20]  I. Amit,et al.  Single-cell spatial reconstruction reveals global division of labor in the mammalian liver , 2016, Nature.

[21]  B. Frey,et al.  Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning , 2015, Nature Biotechnology.

[22]  Sean C. Bendall,et al.  Wishbone identifies bifurcating developmental trajectories from single-cell data , 2016, Nature Biotechnology.

[23]  Z. Bar-Joseph,et al.  Using neural networks for reducing the dimensions of single-cell RNA-Seq data , 2017, Nucleic acids research.

[24]  Khalid Raza,et al.  Machine Learning-based state-of-the-art methods for the classification of RNA-Seq data , 2017, bioRxiv.