matchSCore: Matching Single-Cell Phenotypes Across Tools and Experiments

Single-cell transcriptomics allows the identification of cellular types, subtypes and states through cell clustering. In this process, similar cells are grouped before determining co-expressed marker genes for phenotype inference. The performance of computational tools is directly associated to their marker identification accuracy, but the lack of an optimal solution challenges a systematic method comparison. Moreover, phenotypes from different studies are challenging to integrate, due to varying resolution, methodology and experimental design. In this work we introduce matchSCore (https://github.com/elimereu/matchSCore), an approach to match cell populations fast across tools, experiments and technologies. We compared 14 computational methods and evaluated their accuracy in clustering and gene marker identification in simulated data sets. We further used matchSCore to project cell type identities across mouse and human cell atlas projects. Despite originating from different technologies, cell populations could be matched across data sets, allowing the assignment of clusters to reference maps and their annotation.

[1]  P. Reddien,et al.  Fundamentals of planarian regeneration. , 2004, Annual review of cell and developmental biology.

[2]  P. Linsley,et al.  MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data , 2015, Genome Biology.

[3]  Joshua W. K. Ho,et al.  CIDR: Ultrafast and accurate clustering through imputation for single-cell RNA-seq data , 2016, Genome Biology.

[4]  Roderic Guigó,et al.  The GEM mapper: fast, accurate and versatile alignment by filtration , 2012, Nature Methods.

[5]  King Ngi Ngan,et al.  Jaccard index compensation for object segmentation evaluation , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[6]  I. Amit,et al.  Transcriptional Heterogeneity and Lineage Commitment in Myeloid Progenitors , 2015, Cell.

[7]  S. Teichmann,et al.  Exponential scaling of single-cell RNA-seq in the past decade , 2017, Nature Protocols.

[8]  Hongkai Ji,et al.  TSCAN: Pseudo-time reconstruction and evaluation in single-cell RNA-seq analysis , 2016, Nucleic acids research.

[9]  Andrew J. Hill,et al.  Single-cell mRNA quantification and differential analysis with Census , 2017, Nature Methods.

[10]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[11]  I. Amit,et al.  Massively Parallel Single-Cell RNA-Seq for Marker-Free Decomposition of Tissues into Cell Types , 2014, Science.

[12]  Hui Wang,et al.  SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis , 2015, PLoS Comput. Biol..

[13]  Yuan Lin,et al.  SAFE-clustering: Single-cell Aggregated (From Ensemble) Clustering for Single-cell RNA-seq Data , 2017, bioRxiv.

[14]  Christopher Yau,et al.  pcaReduce: hierarchical clustering of single cell transcriptional profiles , 2015, BMC Bioinformatics.

[15]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[16]  Laleh Haghverdi,et al.  Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors , 2018, Nature Biotechnology.

[17]  I. Hellmann,et al.  Comparative Analysis of Single-Cell RNA Sequencing Methods , 2016, bioRxiv.

[18]  Bernardo J. Clavijo,et al.  Rapid transcriptional plasticity of duplicated gene clusters enables a clonally reproducing aphid to colonise diverse plant species , 2017, Genome Biology.

[19]  S. Orkin,et al.  Mapping the Mouse Cell Atlas by Microwell-Seq , 2018, Cell.

[20]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[21]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression data , 2015 .

[22]  Chun-Nan Hsu,et al.  Weakly supervised learning of biomedical information extraction from curated data , 2016, BMC Bioinformatics.

[23]  Xin Mei,et al.  ascend: R package for analysis of single-cell RNA-seq data , 2017, bioRxiv.

[24]  P. Kharchenko,et al.  Bayesian approach to single-cell differential expression analysis , 2014, Nature Methods.

[25]  Åsa K. Björklund,et al.  Full-length RNA-seq from single cells using Smart-seq2 , 2014, Nature Protocols.

[26]  Camillo Ricordi,et al.  Automated Method for Isolation of Human Pancreatic Islets , 1988, Diabetes.

[27]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.

[28]  M. Hemberg,et al.  scmap: projection of single-cell RNA-seq data across data sets , 2018, Nature Methods.

[29]  Charlotte Soneson,et al.  Bias, robustness and scalability in single-cell differential expression analysis , 2018, Nature Methods.

[30]  Evan Z. Macosko,et al.  A Single-Cell Atlas of Cell Types, States, and Other Transcriptional Patterns from Nine Regions of the Adult Mouse Brain , 2018, bioRxiv.

[31]  Mauro J. Muraro,et al.  A Single-Cell Transcriptome Atlas of the Human Pancreas , 2016, Cell systems.

[32]  Barbara Di Camillo,et al.  Single-Cell RNA-Sequencing: Assessment of Differential Expression Analysis Methods , 2017, Front. Genet..

[33]  R. Irizarry,et al.  Missing data and technical variability in single‐cell RNA‐sequencing experiments , 2018, Biostatistics.

[34]  Maria K. Jaakkola,et al.  Comparison of methods to detect differentially expressed genes between single-cell populations , 2016, Briefings Bioinform..

[35]  Mauricio Barahona,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[36]  Bo Wang,et al.  Visualization and analysis of single-cell RNA-seq data by kernel-based similarity learning , 2016, Nature Methods.

[37]  M. Gut,et al.  bigSCale: an analytical framework for big-scale single-cell data. , 2018, Genome research.

[38]  S. Quake,et al.  Transcriptomic characterization of 20 organs and tissues from mouse at single cell resolution creates a Tabula Muris , 2017, bioRxiv.

[39]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, Genome Biology.