Using Neural Networks to Improve Single Cell RNA-Seq Data Analysis

While only recently developed, the ability to profile expression data in single cells (scRNA-Seq) has already led to several important studies and findings. However, this technology has also raised several new computational challenges including questions related to handling the noisy and sometimes incomplete data, how to identify unique group of cells in such experiments and how to determine the state or function of specific cells based on their expression profile. To address these issues we develop and test a method based on neural networks (NN) for the analysis and retrieval of single cell RNA-Seq data. We tested various NN architectures, some biologically motivated, and used these to obtain a reduced dimension representation of the single cell expression data. We show that the NN method improves upon prior methods in both, the ability to correctly group cells in experiments not used in the training and the ability to correctly infer cell type or state by querying a database of tens of thousands of single cell profiles. Such database queries (which can be performed using our web server) will enable researchers to better characterize cells when analyzing heterogeneous scRNA-Seq samples. Supporting website: http://sb.cs.cmu.edu/scnn/ Password for accessing the retrieval task webserver: scRNA-Seq

[1]  Chun-Nan Hsu,et al.  Weakly supervised learning of biomedical information extraction from curated data , 2016, BMC Bioinformatics.

[2]  David W. Nauen,et al.  Single-Cell RNA-Seq with Waterfall Reveals Molecular Cascades underlying Adult Neurogenesis. , 2015, Cell stem cell.

[3]  Russ B. Altman,et al.  Missing value estimation methods for DNA microarrays , 2001, Bioinform..

[4]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[5]  Mike Tyers,et al.  BioGRID: a general repository for interaction datasets , 2005, Nucleic Acids Res..

[6]  C. Greene,et al.  ADAGE-Based Integration of Publicly Available Pseudomonas aeruginosa Gene Expression Data with Denoising Autoencoders Illuminates Microbe-Host Interactions , 2016, mSystems.

[7]  Sean R. Davis,et al.  NCBI GEO: archive for functional genomics data sets—update , 2012, Nucleic Acids Res..

[8]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[9]  Christopher Yau,et al.  pcaReduce: hierarchical clustering of single cell transcriptional profiles , 2015, BMC Bioinformatics.

[10]  M. Hendrix,et al.  IRF6 in development and disease: A mediator of quiescence and differentiation , 2008, Cell cycle.

[11]  Bo Wang,et al.  SIMLR: a tool for large-scale single-cell analysis by multi-kernel learning , 2017, bioRxiv.

[12]  I. Amit,et al.  Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types , 2014, Science.

[13]  R. Sandberg,et al.  Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells , 2014, Science.

[14]  Yoshua Bengio,et al.  Why Does Unsupervised Pre-training Help Deep Learning? , 2010, AISTATS.

[15]  Lan Bao,et al.  Somatosensory neuron types identified by high-coverage single-cell RNA-sequencing and functional heterogeneity , 2016, Cell Research.

[16]  Ziv Bar-Joseph,et al.  DREM 2.0: Improved reconstruction of dynamic regulatory networks from time-series expression data , 2012, BMC Systems Biology.

[17]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[18]  Christine A. Wells,et al.  Single-Cell Gene Expression Profiles Define Self-Renewing, Pluripotent, and Lineage Primed States of Human Pluripotent Stem Cells , 2014, Stem cell reports.

[19]  Haiyuan Yu,et al.  Detecting overlapping protein complexes in protein-protein interaction networks , 2012, Nature Methods.

[20]  Rona S. Gertner,et al.  Single cell RNA Seq reveals dynamic paracrine control of cellular variation , 2014, Nature.

[21]  Ziv Bar-Joseph,et al.  Identifying proteins controlling key disease signaling pathways , 2013, Bioinform..

[22]  Zhigang Xue,et al.  Simultaneous profiling of transcriptome and DNA methylome from a single cell , 2016, Genome Biology.

[23]  Anna Shcherbina,et al.  Not Just a Black Box: Learning Important Features Through Propagating Activation Differences , 2016, ArXiv.

[24]  Hedi Peterson,et al.  g:Profiler—a web server for functional interpretation of gene lists (2016 update) , 2016, Nucleic Acids Res..

[25]  J. Marioni,et al.  Pooling across cells to normalize single-cell RNA sequencing data with many zero counts , 2016, Genome Biology.

[26]  Angela Yan,et al.  Non-alcoholic fatty liver disease induces signs of Alzheimer’s disease (AD) in wild-type mice and accelerates pathological signs of AD in an AD model , 2016, Journal of Neuroinflammation.

[27]  Hui Wang,et al.  SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis , 2015, PLoS Comput. Biol..

[28]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[29]  Charles C. Kim,et al.  Brain trauma elicits non-canonical macrophage activation states , 2016, Journal of Neuroinflammation.

[30]  Paul C. Blainey,et al.  A microfluidic platform enabling single-cell RNA-seq of multigenerational lineages , 2016, Nature Communications.

[31]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.

[32]  Rona S. Gertner,et al.  Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells , 2013, Nature.

[33]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[34]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[35]  Jens Hjerling-Leffler,et al.  Disentangling neural cell diversity using single-cell transcriptomics , 2016, Nature Neuroscience.

[36]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[37]  C. Tyler-Smith,et al.  Ancient DNA and the rewriting of human history: be sparing with Occam’s razor , 2016, Genome Biology.

[38]  George C Tseng,et al.  Tight Clustering: A Resampling‐Based Approach for Identifying Stable and Tight Patterns in Data , 2005, Biometrics.

[39]  Aleksandra A. Kolodziejczyk,et al.  Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation , 2015, Cell stem cell.

[40]  Pornpimol Charoentong,et al.  Computational genomics tools for dissecting tumour–immune cell interactions , 2016, Nature Reviews Genetics.

[41]  Geoffrey C Gurtner,et al.  The role of focal adhesion complexes in fibroblast mechanotransduction during scar formation. , 2013, Differentiation; research in biological diversity.

[42]  B. Tjaden,et al.  De novo assembly of bacterial transcriptomes from RNA-seq data , 2015, Genome Biology.

[43]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[44]  K. Mclaughlin,et al.  Mouse chimeras as a system to investigate development, cell and tissue function, disease mechanisms and organ regeneration , 2011, Cell cycle.

[45]  John Varga,et al.  The transcriptional coactivator and acetyltransferase p300 in fibroblast biology and fibrosis , 2007, Journal of cellular physiology.

[46]  Jill P. Mesirov,et al.  Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.

[47]  S. Quake,et al.  A survey of human brain transcriptome diversity at the single cell level , 2015, Proceedings of the National Academy of Sciences.

[48]  A. Blais,et al.  Transcriptional control of stem cell fate by E2Fs and pocket proteins , 2015, Front. Genet..

[49]  Fabian J Theis,et al.  Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells , 2015, Nature Biotechnology.

[50]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[51]  S. Linnarsson,et al.  Unbiased classification of sensory neuron types by large-scale single-cell RNA sequencing , 2014, Nature Neuroscience.