Unifying single-cell annotations based on the Cell Ontology

Single cell technologies have rapidly generated an unprecedented amount of data that enables us to understand biological systems at single-cell resolution. However, analyzing datasets generated by independent labs remains challenging due to a lack of consistent terminology to describe cell types. Here, we present OnClass, an algorithm and accompanying software for automatically classifying cells into cell types represented by a controlled vocabulary derived from the Cell Ontology. Cell type similarity is inferred according to the distances in the Cell Ontology so a key advantage of OnClass is its ability to annotate cell types that are not present in the training set by using the hierarchical structure of the vocabulary space. We applied OnClass to diverse collections of single cell transcriptomics of both mouse and human and observed substantial improvement on automated cell type annotation. We further demonstrated how OnClass can be used to identify marker genes for cell types present and absent in the training set, suggesting that OnClass can be used as a tool to associate marker genes to each term of the Cell Ontology, offering the possibility of refining the Cell Ontology using a data-centric approach.

[1]  Oliver Kramer,et al.  Machine Learning for Evolution Strategies , 2016 .

[2]  P. Verstreken,et al.  A Single-Cell Transcriptome Atlas of the Aging Drosophila Brain , 2018, Cell.

[3]  Rayleigh The Problem of the Random Walk , 1905, Nature.

[4]  Lei Shu,et al.  DOC: Deep Open Classification of Text Documents , 2017, EMNLP.

[5]  Principal Investigators,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018 .

[6]  I. Amit,et al.  Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors , 2016, Cell.

[7]  Matteo Pellegrini,et al.  ACTINN: automated identification of cell types in single cell RNA sequencing , 2019, Bioinform..

[8]  Alan Ruttenberg,et al.  The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability , 2016, J. Biomed. Semant..

[9]  Suresh Kannan,et al.  Fibroblasts and mesenchymal stem cells: Two sides of the same coin? , 2018, Journal of cellular physiology.

[10]  H. Sebastian Seung,et al.  Permitted and Forbidden Sets in Symmetric Threshold-Linear Networks , 2003, Neural Computation.

[11]  James T. Webber,et al.  Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris , 2018, Nature.

[12]  Bonnie Berger,et al.  Efficient integration of heterogeneous single-cell transcriptomes using Scanorama , 2019, Nature Biotechnology.

[13]  M. Reinders,et al.  A comparison of automatic cell identification methods for single-cell RNA sequencing data , 2019, Genome Biology.

[14]  Bonnie Berger,et al.  Exploiting ontology graph for predicting sparsely annotated gene function , 2015, Bioinform..

[15]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.

[16]  Bonnie Berger,et al.  Compact Integration of Multi-Network Topology for Functional Analysis of Genes. , 2016, Cell systems.

[17]  J. C. Love,et al.  Seq-Well: A Portable, Low-Cost Platform for High-Throughput Single-Cell RNA-Seq of Low-Input Samples , 2017, Nature Methods.

[18]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[19]  M. Ashburner,et al.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration , 2007, Nature Biotechnology.

[20]  Mauro J. Muraro,et al.  A Single-Cell Transcriptome Atlas of the Human Pancreas , 2016, Cell systems.

[21]  A. O. Walker British Fruit Growing , 1905, Nature.

[22]  Lai Guan Ng,et al.  Dimensionality reduction for visualizing single-cell data using UMAP , 2018, Nature Biotechnology.

[23]  Kieran R. Campbell,et al.  Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling , 2019, Nature Methods.

[24]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[25]  J. Michael Cherry,et al.  Ontology application and use at the ENCODE DCC , 2015, Database J. Biol. Databases Curation.

[26]  Mikael Huss,et al.  Resolution of cell fate decisions revealed by single-cell gene expression analysis from zygote to blastocyst. , 2010, Developmental cell.

[27]  Angela Oliveira Pisco,et al.  A Single Cell Transcriptomic Atlas Characterizes Aging Tissues in the Mouse , 2019, bioRxiv.

[28]  S. Quake,et al.  Dynamic pattern formation in a vesicle-generating microfluidic device. , 2001, Physical review letters.

[29]  Patrick Cahan,et al.  SingleCellNet: a computational tool to classify single cell RNA-Seq data across platforms and across species , 2018 .

[30]  I. Amit,et al.  Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors , 2015, Cell.

[31]  Christoph Hafemeister,et al.  Comprehensive integration of single cell data , 2018, bioRxiv.

[32]  J. George,et al.  Single-cell transcriptomes identify human islet cell signatures and reveal cell-type–specific expression changes in type 2 diabetes , 2017, Genome research.

[33]  M. Ashburner,et al.  An ontology for cell types , 2005, Genome Biology.

[34]  Mauro J. Muraro,et al.  De Novo Prediction of Stem Cell Identity using Single-Cell Transcriptome Data , 2016, Cell stem cell.

[35]  Rona S. Gertner,et al.  Single cell RNA Seq reveals dynamic paracrine control of cellular variation , 2014, Nature.

[36]  S. Orkin,et al.  Mapping the Mouse Cell Atlas by Microwell-Seq , 2018, Cell.

[37]  Justin K. Huang,et al.  Typing tumors using pathways selected by somatic evolution , 2018, Nature Communications.

[38]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[39]  Catalin C. Barbacioru,et al.  mRNA-Seq whole-transcriptome analysis of a single cell , 2009, Nature Methods.

[40]  Paul J. Hoffman,et al.  Comprehensive Integration of Single-Cell Data , 2018, Cell.

[41]  Michael I. Jordan,et al.  Deep Generative Modeling for Single-cell Transcriptomics , 2018, Nature Methods.

[42]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[43]  Cole Trapnell,et al.  Supervised classification enables rapid annotation of cell atlases , 2019, Nature Methods.

[44]  Chun Jimmie Ye,et al.  Multiplexed droplet single-cell RNA-sequencing using natural genetic variation , 2017, Nature Biotechnology.

[45]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[46]  Matteo Pellegrini,et al.  Automated identification of Cell Types in Single Cell RNA Sequencing , 2019, bioRxiv.

[47]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[48]  D. M. Smith,et al.  Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes , 2016, Cell metabolism.

[49]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[50]  Nicola K. Wilson,et al.  A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. , 2016, Blood.

[51]  Derek W Wright,et al.  Gateways to the FANTOM5 promoter level mammalian expression atlas , 2015, Genome Biology.

[52]  Rui Hou,et al.  scMatch: a single-cell gene expression profile annotation tool using reference datasets , 2019, Bioinform..