Cellar: Interactive single cell data annotation tool

Several recent technologies and platforms enable the profiling of various molecular signals at the single-cell level. A key question for all studies using such data is the assignment of cell types. To improve the ability to correctly assign cell types in single and multi-omics sequencing and imaging single-cell studies, we developed Cellar. This interactive software tool supports all steps in the analysis and assignment process. We demonstrate the advantages of Cellar by using it to annotate several HuBMAP datasets from multi-omics single-cell sequencing and spatial proteomics studies. Cellar is freely available and includes several annotated reference HuBMAP datasets. Availability https://data.test.hubmapconsortium.org/app/cellar

[1]  B. L. Welch The generalisation of student's problems when several different population variances are involved. , 1947, Biometrika.

[2]  田中 俊典 National Center for Biotechnology Information (NCBI) , 2012 .

[3]  Jeffrey R. Powell,et al.  Transgenic Aedes aegypti Mosquitoes Transfer Genes into a Natural Population , 2019, Scientific Reports.

[4]  Stéphane Lafon,et al.  Diffusion maps , 2006 .

[5]  Fabian J. Theis,et al.  The Human Lung Cell Atlas - A high-resolution reference map of the human lung in health and disease. , 2019, American journal of respiratory cell and molecular biology.

[6]  Francesco Vallania,et al.  KLRD1-expressing natural killer cells predict influenza susceptibility , 2018, Genome Medicine.

[7]  Welch Bl THE GENERALIZATION OF ‘STUDENT'S’ PROBLEM WHEN SEVERAL DIFFERENT POPULATION VARLANCES ARE INVOLVED , 1947 .

[8]  Atul J. Butte,et al.  Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage , 2018, Nature Immunology.

[9]  Luke Zappia,et al.  Opportunities and challenges in long-read sequencing data analysis , 2020, Genome Biology.

[10]  Lorenzo Trippa,et al.  Robust lineage reconstruction from high-dimensional single-cell data , 2016, bioRxiv.

[11]  T. Insel,et al.  The NIH BRAIN Initiative , 2013, Science.

[12]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[13]  Salil S. Bhate,et al.  Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging , 2017, Cell.

[14]  Kieran R. Campbell,et al.  Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling , 2019, Nature Methods.

[15]  Vincent A. Traag,et al.  From Louvain to Leiden: guaranteeing well-connected communities , 2018, Scientific Reports.

[16]  Kun Zhang,et al.  High-throughput sequencing of the transcriptome and chromatin accessibility in the same cell , 2019, Nature Biotechnology.

[17]  Howard Y. Chang,et al.  Transposition of native chromatin for fast and sensitive epigenomic profiling of open chromatin, DNA-binding proteins and nucleosome position , 2013, Nature Methods.

[18]  Fabian J Theis,et al.  The Human Cell Atlas , 2017, bioRxiv.

[19]  Shila Ghazanfar,et al.  The human body at cellular resolution: the NIH Human Biomolecular Atlas Program , 2019, Nature.

[20]  Arindam Banerjee,et al.  Semi-supervised Clustering by Seeding , 2002, ICML.

[21]  Nathan Halko,et al.  Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions , 2009, SIAM Rev..

[22]  Oscar Franzén,et al.  PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data , 2019, Database J. Biol. Databases Curation.

[23]  Stein Aerts,et al.  cisTopic: cis-regulatory topic modeling on single-cell ATAC-seq data , 2019, Nature Methods.

[24]  Clifford A. Meyer,et al.  Integrative analyses of single-cell transcriptome and regulome using MAESTRO , 2020, Genome Biology.

[25]  Howard Y. Chang,et al.  Transposition of native chromatin for multimodal regulatory analysis and personal epigenomics , 2014 .

[26]  Susumu Goto,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 2000, Nucleic Acids Res..

[27]  A. Wald Contributions to the Theory of Statistical Estimation and Testing Hypotheses , 1939 .

[28]  J. Mesirov,et al.  The Molecular Signatures Database Hallmark Gene Set Collection , 2015 .

[29]  Evan Z. Macosko,et al.  Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution , 2019, Science.

[30]  Paul Geladi,et al.  Principal Component Analysis , 1987, Comprehensive Chemometrics.

[31]  Cole Trapnell,et al.  Supervised classification enables rapid annotation of cell atlases , 2019, Nature Methods.

[32]  Feng Li,et al.  CellMarker: a manually curated resource of cell markers in human and mouse , 2018, Nucleic Acids Res..

[33]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[34]  P. Hansen The truncatedSVD as a method for regularization , 1987 .

[35]  H. B. Mann,et al.  On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other , 1947 .

[36]  Nancy R. Zhang,et al.  The Human Tumor Atlas Network: Charting Tumor Transitions across Space and Time at Single-Cell Resolution , 2020, Cell.

[37]  J. Kruskal Nonmetric multidimensional scaling: A numerical method , 1964 .

[38]  William J. Greenleaf,et al.  chromVAR: Inferring transcription factor-associated accessibility from single-cell epigenomic data , 2017, Nature Methods.

[39]  Leland McInnes,et al.  UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction , 2018, ArXiv.

[40]  Jitendra Malik,et al.  Normalized Cuts and Image Segmentation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[42]  M. Robinson,et al.  A systematic performance evaluation of clustering methods for single-cell RNA-seq data. , 2018, F1000Research.

[43]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[44]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression , 2015, Nature Biotechnology.

[45]  M. Wagner,et al.  Surface membrane polarity of proximal tubular cells: alterations as a basis for malfunction. , 1996, Kidney international.

[46]  Mark Gerstein,et al.  GENCODE reference annotation for the human and mouse genomes , 2018, Nucleic Acids Res..

[47]  Charlotte Soneson,et al.  A systematic performance evaluation of clustering methods for single-cell RNA-seq data , 2020, F1000Research.

[48]  Francesco E. Maranzana,et al.  On the Location of Supply Points to Minimize Transportation Costs , 1963, IBM Syst. J..

[49]  F. Shi,et al.  Organ-specific features of natural killer cells , 2011, Nature Reviews Immunology.

[50]  Rui Hou,et al.  scMatch: a single-cell gene expression profile annotation tool using reference datasets , 2019, Bioinform..

[51]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[52]  Fabian J. Theis,et al.  destiny: diffusion maps for large-scale single-cell data in R , 2015, Bioinform..

[53]  R. Negrin,et al.  Natural killer cells in allogeneic transplantation: effect on engraftment, graft- versus-tumor, and graft-versus-host responses. , 2009, Biology of blood and marrow transplantation : journal of the American Society for Blood and Marrow Transplantation.

[54]  David E. Irwin,et al.  Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior , 2011, 2011 31st International Conference on Distributed Computing Systems Workshops.

[55]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.