SCINA: Semi-Supervised Analysis of Single Cells in silico

Advances in single-cell RNA sequencing (scRNA-Seq) have allowed for comprehensive analyses of single cell data. However, current analyses of scRNA-Seq data usually start from unsupervised clustering or visualization. These methods ignore the prior knowledge of transcriptomes and of the probable structures of the data. Moreover, cell identification heavily relies on subjective and inaccurate human inspection afterwards. We reversed this paradigm and developed SCINA, a semi-supervised model, for analyses of scRNA-Seq and flow cytometry/CyTOF data, and other data of similar format, by automatically exploiting previously established gene signatures using an expectation-maximization (EM) algorithm. We applied SCINA on a wide range of datasets, and showed its accuracy, stableness and efficiency exceeded most popular unsupervised approaches. Notably, SCINA discovered an intermediate stage of oligodendrocyte from mouse brain scRNA-Seq data. SCINA also detected immune cell population shifting in Stk4 knock-out mouse cytometry data. Finally, SCINA identified a new kidney tumor clade with similarity to FH-deficient tumors from bulk tumor data. Overall, SCINA provides both methodological advances and biological insights from perspectives different from traditional analytical methods.

[1]  Mukesh Jain,et al.  NGS QC Toolkit: A Toolkit for Quality Control of Next Generation Sequencing Data , 2012, PloS one.

[2]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[3]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[4]  Christopher A. Miller,et al.  VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. , 2012, Genome research.

[5]  Israel Steinfeld,et al.  BMC Bioinformatics BioMed Central , 2008 .

[6]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[7]  Lisa N Kinch,et al.  Spectrum of diverse genomic alterations define non–clear cell renal carcinoma subtypes , 2014, Nature Genetics.

[8]  Wei Shi,et al.  featureCounts: an efficient general purpose program for assigning sequence reads to genomic features , 2013, Bioinform..

[9]  Paul T. Spellman,et al.  The Cancer Genome Atlas Comprehensive Molecular Characterization of Renal Cell Carcinoma , 2018, Cell reports.

[10]  Hao Chen,et al.  Cytofkit: A Bioconductor Package for an Integrated Mass Cytometry Data Analysis Pipeline , 2016, PLoS Comput. Biol..

[11]  Hui Wang,et al.  SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis , 2015, PLoS Comput. Biol..

[12]  Ke Deng,et al.  High-dimensional genomic data bias correction and data integration using MANCIE , 2016, Nature Communications.

[13]  C. Kublin,et al.  RNA-Seq and CyTOF immuno-profiling of regenerating lacrimal glands identifies a novel subset of cells expressing muscle-related proteins , 2017, PloS one.

[14]  Christopher J. Nelson,et al.  The prolyl isomerase FKBP25 regulates microtubule polymerization impacting cell cycle progression and genomic stability , 2018, Nucleic acids research.

[15]  Leonard D. Goldstein,et al.  An Empirical Approach Leveraging Tumorgrafts to Dissect the Tumor Microenvironment in Renal Cell Carcinoma Identifies Missing Link to Prognostic Inflammatory Factors. , 2018, Cancer discovery.

[16]  H. Lenz,et al.  Molecular Pathways: Hippo Signaling, a Critical Tumor Suppressor , 2015, Clinical Cancer Research.

[17]  Richard A. Muscat,et al.  Single-cell profiling of the developing mouse brain and spinal cord with split-pool barcoding , 2018, Science.

[18]  Regina K. Cheung,et al.  Screening: CyTOF—the next generation of cell detection , 2011, Nature Reviews Rheumatology.

[19]  W. Tao,et al.  Mst1 positively regulates B-cell receptor signaling via CD19 transcriptional levels. , 2016, Blood advances.

[20]  Åsa K. Björklund,et al.  Full-length RNA-seq from single cells using Smart-seq2 , 2014, Nature Protocols.

[21]  A. Schäffer,et al.  The phenotype of human STK4 deficiency. , 2011, Blood.

[22]  Sean C. Bendall,et al.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis , 2015, Cell.

[23]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[24]  Zohar Yakhini,et al.  Discovering Motifs in Ranked Lists of DNA Sequences , 2007, PLoS Comput. Biol..

[25]  Steven J. M. Jones,et al.  Comprehensive Molecular Characterization of Papillary Renal-Cell Carcinoma. , 2016, The New England journal of medicine.

[26]  Z. Trajanoski,et al.  Spatiotemporal dynamics of intratumoral immune cells reveal the immune landscape in human cancer. , 2013, Immunity.

[27]  D. Sidransky,et al.  Role of the p16 tumor suppressor gene in cancer. , 1998, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[28]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[29]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[30]  A. Sivachenko,et al.  Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples , 2013, Nature Biotechnology.

[31]  Mauricio O. Carneiro,et al.  From FastQ Data to High‐Confidence Variant Calls: The Genome Analysis Toolkit Best Practices Pipeline , 2013, Current protocols in bioinformatics.

[32]  V. Reuter,et al.  Tubulocystic Carcinoma of the Kidney With Poorly Differentiated Foci: A Frequent Morphologic Pattern of Fumarate Hydratase-deficient Renal Cell Carcinoma , 2016, The American journal of surgical pathology.

[33]  C. Spencer,et al.  Identification of loci associated with schizophrenia by genome-wide association and follow-up , 2008, Nature Genetics.

[34]  Sarah E. Medland,et al.  A Quantitative-Trait Genome-Wide Association Study of Alcoholism Risk in the Community: Findings and Implications , 2011, Biological Psychiatry.

[35]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[36]  H. Hakonarson,et al.  ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data , 2010, Nucleic acids research.

[37]  Ying Wang,et al.  Real-time resolution of point mutations that cause phenovariance in mice , 2015, Proceedings of the National Academy of Sciences.

[38]  Li Li,et al.  The Hippo-YAP pathway in organ size control and tumorigenesis: an updated version. , 2010, Genes & development.

[39]  E. Kämpgen,et al.  Functions of myeloid and lymphoid dendritic cells. , 2000, Immunology letters.