CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing

Cell type identification is essential for single-cell RNA sequencing (scRNA-seq) studies that are currently transforming the life sciences. CHETAH (CHaracterization of cEll Types Aided by Hierarchical clustering) is an accurate cell type identification algorithm that is rapid and selective, including the possibility of intermediate or unassigned categories. Evidence for assignment is based on a classification tree of previously available scRNA-seq reference data and includes a confidence score based on the variance in gene expression per cell type. For cell types represented in the reference data, CHETAH’s accuracy is as good as existing methods. Its specificity is superior when cells of an unknown type are encountered, such as malignant cells in tumor samples which it pinpoints as intermediate or unassigned. Although designed for tumor samples in particular, the use of unassigned and intermediate types is also valuable in other exploratory studies. This is exemplified in pancreas datasets where CHETAH highlights cell populations not well represented in the reference dataset, including cells with profiles that lie on a continuum between that of acinar and ductal cell types. Having the possibility of unassigned and intermediate cell types is pivotal for preventing misclassification and can yield important biological information for previously unexplored tissues.

[1]  Pierre Geurts,et al.  SCENIC: Single-cell regulatory network inference and clustering , 2017 .

[2]  L. J. K. Wee,et al.  Reference component analysis of single-cell transcriptomes elucidates cellular heterogeneity in human colorectal tumors , 2017, Nature Genetics.

[3]  Howard Y. Chang,et al.  Single-cell chromatin accessibility reveals principles of regulatory variation , 2015, Nature.

[4]  D. Weitz,et al.  Single-cell ChIP-seq reveals cell subpopulations defined by chromatin state , 2015, Nature Biotechnology.

[5]  Andrew J. Hill,et al.  Single-cell mRNA quantification and differential analysis with Census , 2017, Nature Methods.

[6]  Jeong Eon Lee,et al.  Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer , 2017, Nature Communications.

[7]  C. Sigel,et al.  Expression of Markers of Hepatocellular Differentiation in Pancreatic Acinar Cell Neoplasms:  A Potential Diagnostic Pitfall. , 2016, American journal of clinical pathology.

[8]  M. Hemberg,et al.  scmap: projection of single-cell RNA-seq data across data sets , 2018, Nature Methods.

[9]  A. Joyner,et al.  Childhood Cerebellar Tumors Mirror Conserved Fetal Transcriptional Programs , 2018, bioRxiv.

[10]  Guo-Cheng Yuan,et al.  Single-Cell Analysis in Cancer Genomics. , 2015, Trends in genetics : TIG.

[11]  Hannah A. Pliner,et al.  Reversed graph embedding resolves complex single-cell trajectories , 2017, Nature Methods.

[12]  Davis J. McCarthy,et al.  A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor , 2016, F1000Research.

[13]  Jinzhou Yuan,et al.  Single-Cell Transcriptomic Analysis of Tumor Heterogeneity. , 2018, Trends in cancer.

[14]  Junil Kim,et al.  CellBIC: bimodality-based top-down clustering of single-cell RNA sequencing data reveals hierarchical structure of the cell type , 2018, Nucleic acids research.

[15]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[16]  P. Reddien,et al.  Fundamentals of planarian regeneration. , 2004, Annual review of cell and developmental biology.

[17]  A. Rustgi,et al.  Pancreatic ductal cells in development, regeneration, and neoplasia. , 2011, The Journal of clinical investigation.

[18]  Erik Sundström,et al.  RNA velocity of single cells , 2018, Nature.

[19]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[20]  S. Linnarsson,et al.  Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. , 2011, Genome research.

[21]  S. Orkin,et al.  Mapping the Mouse Cell Atlas by Microwell-Seq , 2018, Cell.

[22]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[23]  L. Bouwens Cytokeratins and cell differentiation in the pancreas , 1998, The Journal of pathology.

[24]  Fabian J Theis,et al.  The Human Cell Atlas , 2017, bioRxiv.

[25]  R. Gonzalo Parra,et al.  Reconstructing complex lineage trees from scRNA-seq data using MERLoT , 2018 .

[26]  Allon M. Klein,et al.  Single-cell barcoding and sequencing using droplet microfluidics , 2016, Nature Protocols.

[27]  A. Oudenaarden,et al.  Design and Analysis of Single-Cell Sequencing Experiments , 2015, Cell.

[28]  Åsa K. Björklund,et al.  Full-length RNA-seq from single cells using Smart-seq2 , 2014, Nature Protocols.

[29]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[30]  Lior Rokach,et al.  CaSTLe – Classification of single cells by transfer learning: Harnessing the power of publicly available single cell RNA sequencing experiments to annotate new experiments , 2018, PloS one.

[31]  Atul J. Butte,et al.  Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage , 2018, Nature Immunology.

[32]  Takamasa Kudo,et al.  Measuring Signaling and RNA-Seq in the Same Cell Links Gene Expression to Dynamic Patterns of NF-κB Activation. , 2017, Cell systems.

[33]  Keith A. Baggerly,et al.  Immune cell profiling in cancer: molecular approaches to cell-specific identification , 2017, npj Precision Oncology.

[34]  P. Carmeliet,et al.  Phenotype molding of stromal cells in the lung tumor microenvironment , 2018, Nature Medicine.

[35]  Aaron T. L. Lun,et al.  Scater: pre-processing, quality control, normalization and visualization of single-cell RNA-seq data in R , 2017, Bioinform..

[36]  H. Friess,et al.  StellaTUM: current consensus and discussion on pancreatic stellate cell research , 2011, Gut.

[37]  P. Kharchenko,et al.  Integrative single-cell analysis of transcriptional and epigenetic states in the human adult brain , 2017, Nature Biotechnology.

[38]  Wei Liu,et al.  CancerSEA: a cancer single-cell state atlas , 2018, Nucleic Acids Res..

[39]  J. Aerts,et al.  SCENIC: Single-cell regulatory network inference and clustering , 2017, Nature Methods.

[40]  Mauro J. Muraro,et al.  A Single-Cell Transcriptome Atlas of the Human Pancreas , 2016, Cell systems.

[41]  Boxi Kang,et al.  Landscape of Infiltrating T Cells in Liver Cancer Revealed by Single-Cell Sequencing , 2017, Cell.

[42]  Edda Klipp,et al.  Estimation of immune cell content in tumour tissue using single-cell RNA-seq data , 2017, Nature Communications.

[43]  Wei Huang,et al.  Expression of human cationic trypsinogen (PRSS1) in murine acinar cells promotes pancreatitis and apoptotic cell death , 2014, Cell Death and Disease.

[44]  Charles H. Yoon,et al.  Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq , 2016, Science.

[45]  Shawn M. Gillespie,et al.  Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer , 2017, Cell.

[46]  Xun Zhu,et al.  Single cell transcriptomics reveals unanticipated features of early hematopoietic precursors , 2016, Nucleic acids research.

[47]  Hans Clevers,et al.  Single-cell messenger RNA sequencing reveals rare intestinal cell types , 2015, Nature.

[48]  S. Teichmann,et al.  Exponential scaling of single-cell RNA-seq in the past decade , 2017, Nature Protocols.

[49]  Lu Wen,et al.  Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas , 2016, Cell Research.

[50]  Fabian J Theis,et al.  Decoding the Regulatory Network for Blood Development from Single-Cell Gene Expression Measurements , 2015, Nature Biotechnology.

[51]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[52]  Juan Carlos Fernández,et al.  Multiobjective evolutionary algorithms to identify highly autocorrelated areas: the case of spatial distribution in financially compromised farms , 2014, Ann. Oper. Res..

[53]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.