A multitask clustering approach for single-cell RNA-seq analysis in Recessive Dystrophic Epidermolysis Bullosa

Single-cell RNA sequencing (scRNA-seq) has been widely applied to discover new cell types by detecting sub-populations in a heterogeneous group of cells. Since scRNA-seq experiments have lower read coverage/tag counts and introduce more technical biases compared to bulk RNA-seq experiments, the limited number of sampled cells combined with the experimental biases and other dataset specific variations presents a challenge to cross-dataset analysis and discovery of relevant biological variations across multiple cell populations. In this paper, we introduce a method of variance-driven multitask clustering of single-cell RNA-seq data (scVDMC) that utilizes multiple single-cell populations from biological replicates or different samples. scVDMC clusters single cells in multiple scRNA-seq experiments of similar cell types and markers but varying expression patterns such that the scRNA-seq data are better integrated than typical pooled analyses which only increase the sample size. By controlling the variance among the cell clusters within each dataset and across all the datasets, scVDMC detects cell sub-populations in each individual experiment with shared cell-type markers but varying cluster centers among all the experiments. Applied to two real scRNA-seq datasets with several replicates and one large-scale droplet-based dataset on three patient samples, scVDMC more accurately detected cell populations and known cell markers than pooled clustering and other recently proposed scRNA-seq clustering methods. In the case study applied to in-house Recessive Dystrophic Epidermolysis Bullosa (RDEB) scRNA-seq data, scVDMC revealed several new cell types and unknown markers validated by flow cytometry. MATLAB/Octave code available at https://github.com/kuanglab/scVDMC.

[1]  N. Neff,et al.  Reconstructing lineage hierarchies of the distal lung epithelium using single cell RNA-seq , 2014, Nature.

[2]  Jay W. Shin,et al.  Temporal dynamics and transcriptional control using single-cell gene expression analysis , 2013, Genome Biology.

[3]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[4]  Deborah Lavin,et al.  Gremlin1 plays a key role in kidney development and renal fibrosis , 2017, American journal of physiology. Renal physiology.

[5]  Mauricio Barahona,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[6]  Changshui Zhang,et al.  Transferred Dimensionality Reduction , 2008, ECML/PKDD.

[7]  Jakub Tolar,et al.  Bone marrow transplantation for recessive dystrophic epidermolysis bullosa. , 2010, The New England journal of medicine.

[8]  Evan Z. Macosko,et al.  Highly Parallel Genome-wide Expression Profiling of Individual Cells Using Nanoliter Droplets , 2015, Cell.

[9]  Feiping Nie,et al.  Unsupervised Feature Selection via Unified Trace Ratio Formulation and K-means Clustering (TRACK) , 2014, ECML/PKDD.

[10]  Ruiqiang Li,et al.  Single-cell RNA-Seq profiling of human preimplantation embryos and embryonic stem cells , 2013, Nature Structural &Molecular Biology.

[11]  Allon M. Klein,et al.  Droplet Barcoding for Single-Cell Transcriptomics Applied to Embryonic Stem Cells , 2015, Cell.

[12]  Chen Xu,et al.  Identification of cell types from single-cell transcriptomes using a novel clustering method , 2015, Bioinform..

[13]  Y. Kaneda,et al.  Transplanted Bone Marrow–Derived Circulating PDGFRα+ Cells Restore Type VII Collagen in Recessive Dystrophic Epidermolysis Bullosa Mouse Skin Graft , 2015, The Journal of Immunology.

[14]  N. Morris,et al.  Type VII collagen forms an extended network of anchoring fibrils , 1987, The Journal of cell biology.

[15]  Rona S. Gertner,et al.  Single cell RNA Seq reveals dynamic paracrine control of cellular variation , 2014, Nature.

[16]  S. Linnarsson,et al.  Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. , 2011, Genome research.

[17]  Aleksandra A. Kolodziejczyk,et al.  Single Cell RNA-Sequencing of Pluripotent States Unlocks Modular Transcriptional Variation , 2015, Cell stem cell.

[18]  Qiang Yang,et al.  Transfer Learning via Dimensionality Reduction , 2008, AAAI.

[19]  L. Woolner,et al.  Pulmonary fibrosis. , 1954, The Medical clinics of North America.

[20]  Cole Trapnell,et al.  The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells , 2014, Nature Biotechnology.

[21]  Tuan Zea Tan,et al.  A COL11A1-correlated pan-cancer gene signature of activated fibroblasts for the prioritization of therapeutic targets. , 2016, Cancer letters.

[22]  Y. Kaneda,et al.  PDGFRα-positive cells in bone marrow are mobilized by high mobility group box 1 (HMGB1) to regenerate injured epithelia , 2011, Proceedings of the National Academy of Sciences.

[23]  Karlynn E. Neu,et al.  Single-Cell Genomics: Approaches and Utility in Immunology. , 2017, Trends in immunology.

[24]  Cristiana Rastellini,et al.  Gremlin is a key pro-fibrogenic factor in chronic pancreatitis , 2015, Journal of Molecular Medicine.

[25]  Rhonda Bacher,et al.  Design and computational analysis of single-cell RNA-sequencing experiments , 2016, Genome Biology.

[26]  L. Bruckner-Tuderman,et al.  Dystrophic epidermolysis bullosa: pathogenesis and clinical features. , 2010, Dermatologic clinics.

[27]  S. Linnarsson,et al.  Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq , 2015, Science.

[28]  R. Mecham,et al.  The microfibril-associated glycoproteins (MAGPs) and the microfibrillar niche. , 2015, Matrix biology : journal of the International Society for Matrix Biology.

[29]  M. Schaub,et al.  SC3 - consensus clustering of single-cell RNA-Seq data , 2016, Nature Methods.

[30]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[31]  H M Horn,et al.  Quality of life in epidermolysis bullosa , 2002, Clinical and experimental dermatology.

[32]  P. Tsao,et al.  Epithelial Notch signaling regulates lung alveolar morphogenesis and airway epithelial integrity , 2016, Proceedings of the National Academy of Sciences.

[33]  J. McGrath,et al.  Serum levels of high mobility group box 1 correlate with disease severity in recessive dystrophic epidermolysis bullosa , 2013, Experimental dermatology.

[34]  Qiang Yang,et al.  Self-taught clustering , 2008, ICML '08.

[35]  A. van Oudenaarden,et al.  Using Gene Expression Noise to Understand Gene Regulation , 2012, Science.

[36]  Mingxiang Teng,et al.  On the widespread and critical impact of systematic bias and batch effects in single-cell RNA-Seq data , 2015 .

[37]  Brent S. Pedersen,et al.  Desmoplakin Variants Are Associated with Idiopathic Pulmonary Fibrosis. , 2016, American journal of respiratory and critical care medicine.

[38]  Rona S. Gertner,et al.  Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells , 2013, Nature.

[39]  S. Teichmann,et al.  Computational and analytical challenges in single-cell transcriptomics , 2015, Nature Reviews Genetics.

[40]  Jakub Tolar,et al.  From marrow to matrix: novel gene and cell therapies for epidermolysis bullosa. , 2015, Molecular therapy : the journal of the American Society of Gene Therapy.

[41]  Judith A. Blake,et al.  Mouse Genome Database (MGD)-2017: community knowledge resource for the laboratory mouse , 2016, Nucleic Acids Res..

[42]  A. South,et al.  Understanding the pathogenesis of recessive dystrophic epidermolysis bullosa squamous cell carcinoma. , 2010, Dermatologic clinics.

[43]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression , 2015, Nature Biotechnology.

[44]  M Goossens,et al.  Genetic linkage of recessive dystrophic epidermolysis bullosa to the type VII collagen gene. , 1992, The Journal of clinical investigation.

[45]  D. Hebenstreit Methods, Challenges and Potentials of Single Cell RNA-seq , 2012, Biology.

[46]  A. Regev,et al.  Spatial reconstruction of single-cell gene expression data , 2015 .

[47]  Koji Tsuda,et al.  CellTree: an R/bioconductor package to infer the hierarchical structure of cell populations from single-cell RNA-seq data , 2016, BMC Bioinformatics.

[48]  M. Stephens,et al.  Visualizing the structure of RNA-seq expression data using grade of membership models , 2017, PLoS genetics.

[49]  E. Pierson,et al.  ZIFA: Dimensionality reduction for zero-inflated single-cell gene expression analysis , 2015, Genome Biology.