SIDEseq: A Cell Similarity Measure Defined by Shared Identified Differentially Expressed Genes for Single-Cell RNA sequencing Data

One goal of single-cell RNA sequencing (scRNA seq) is to expose possible heterogeneity within cell populations due to meaningful, biological variation. Examining cell-to-cell heterogeneity, and further, identifying subpopulations of cells based on scRNA seq data has been of common interest in life science research. A key component to successfully identifying cell subpopulations (or clustering cells) is the (dis)similarity measure used to group the cells. In this paper, we introduce a novel measure, named SIDEseq, to assess cell-to-cell similarity using scRNA seq data. SIDEseq first identifies a list of putative differentially expressed (DE) genes for each pair of cells. SIDEseq then integrates the information from all the DE gene lists (corresponding to all pairs of cells) to build a similarity measure between two cells. SIDEseq can be implemented in any clustering algorithm that requires a (dis)similarity matrix. This new measure incorporates information from all cells when evaluating the similarity between any two cells, a characteristic not commonly found in existing (dis)similarity measures. This property is advantageous for two reasons: (a) borrowing information from cells of different subpopulations allows for the investigation of pairwise cell relationships from a global perspective and (b) information from other cells of the same subpopulation could help to ensure a robust relationship assessment. We applied SIDEseq to a newly generated human ovarian cancer scRNA seq dataset, a public human embryo scRNA seq dataset, and several simulated datasets. The clustering results suggest that the SIDEseq measure is capable of uncovering important relationships between cells, and outperforms or at least does as well as several popular (dis)similarity measures when used on these datasets.

[1]  W. Di,et al.  Thrombin promotes epithelial ovarian cancer cell invasion by inducing epithelial-mesenchymal transition , 2013, Journal of gynecologic oncology.

[2]  Tobias M. Gorges,et al.  Circulating tumor cells as therapy-related biomarkers in cancer patients , 2013, Cancer Immunology, Immunotherapy.

[3]  K. Hansen,et al.  Removing technical variability in RNA-seq data using conditional quantile normalization , 2012, Biostatistics.

[4]  A. Oudenaarden,et al.  Validation of noise models for single-cell transcriptomics , 2014, Nature Methods.

[5]  Junhyong Kim,et al.  The promise of single-cell sequencing , 2013, Nature Methods.

[6]  L. O’Driscoll,et al.  Phenotypic and global gene expression profile changes between low passage and high passage MIN-6 cells. , 2006, The Journal of endocrinology.

[7]  Mehmet Toner,et al.  Inertial Focusing for Tumor Antigen–Dependent and –Independent Sorting of Rare Circulating Tumor Cells , 2013, Science Translational Medicine.

[8]  Yin-hua Yu,et al.  Transformation of Epithelial Ovarian Cancer Stemlike Cells into Mesenchymal Lineage via EMT Results in Cellular Heterogeneity and Supports Tumor Engraftment , 2012, Molecular medicine.

[9]  Hans Clevers,et al.  Single-cell messenger RNA sequencing reveals rare intestinal cell types , 2015, Nature.

[10]  R. Sandberg,et al.  Full-Length mRNA-Seq from single cell levels of RNA and individual circulating tumor cells , 2012, Nature Biotechnology.

[11]  Tanja Fehm,et al.  Expression of stem cell and epithelial-mesenchymal transition markers in primary breast cancer patients with circulating tumor cells , 2012, Breast Cancer Research.

[12]  Wen-feng Gou,et al.  The role of RhoC in epithelial-to-mesenchymal transition of ovarian carcinoma cells , 2014, BMC Cancer.

[13]  A. Oshlack,et al.  Splatter: simulation of single-cell RNA sequencing data , 2017, bioRxiv.

[14]  R. Durbin,et al.  Using probabilistic estimation of expression residuals (PEER) to obtain increased power and interpretability of gene expression analyses , 2012, Nature Protocols.

[15]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[16]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[17]  Ruiqiang Li,et al.  Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells , 2013, Nature Structural &Molecular Biology.

[18]  George Michailidis,et al.  A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data , 2015, Bioinform..

[19]  E. Levanon,et al.  Human housekeeping genes, revisited. , 2013, Trends in genetics : TIG.

[20]  R. Sandberg Entering the era of single-cell transcriptomics in biology and medicine , 2013, Nature Methods.

[21]  S. Dudoit,et al.  Normalization of RNA-seq data using factor analysis of control genes or samples , 2014, Nature Biotechnology.

[22]  Wenjun Guo,et al.  The Epithelial-Mesenchymal Transition Generates Cells with Properties of Stem Cells , 2008, Cell.

[23]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[24]  Guocheng Yuan,et al.  GiniClust: detecting rare cell types from single-cell gene expression data with Gini index , 2016, Genome Biology.

[25]  Derek C. Radisky,et al.  OTX1 expression in breast cancer is regulated by p53 , 2014, Oncogene.

[26]  Sandrine Dudoit,et al.  Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments , 2010, BMC Bioinformatics.

[27]  Peng Jiang,et al.  Quality control of single-cell RNA-seq by SinQC , 2016, Bioinform..

[28]  Chawnshang Chang,et al.  Suppression Versus Induction of Androgen Receptor Functions by the Phosphatidylinositol 3-Kinase/Akt Pathway in Prostate Cancer LNCaP Cells with Different Passage Numbers* , 2003, Journal of Biological Chemistry.

[29]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[30]  Sean C. Bendall,et al.  Data-Driven Phenotypic Dissection of AML Reveals Progenitor-like Cells that Correlate with Prognosis , 2015, Cell.

[31]  I. Tinhofer,et al.  Circulating tumour cells escape from EpCAM-based detection due to epithelial-to-mesenchymal transition , 2012, BMC Cancer.

[32]  Kevin Karplus,et al.  Digital Synthesis of Plucked-String and Drum Timbers , 1983 .