Pseudo-grading of tumor subclones using phenotype algebra

Robust characterization of cellular phenotypes from single-cell gene expression data is of paramount importance in studying complex biological systems and diseases. Single-cell RNA-sequencing (scRNA- seq), coupled with robust computational analysis, facilitates characterization of phenotypic heterogeneity in tumors. Current scRNA-seq analysis pipelines are capable of accurately identifying a myriad of malignant and non-malignant cell subtypes from single-cell profiling of tumor microenvironments. Unfortunately, given the extent of phenotypic heterogeneity, it is not straightforward to assess the risk associated with individual malignant cell subpopulations in a tumor, primarily due to the complexity of the cancer phenotype space and the lack of clinical annotations associated with tumor scRNA-seq studies, involving prospectively collected tissue samples. Effective risk-stratification of individual malignant subclones holds promise for formulating tailored therapeutic interventions. To this end, we present SCellBOW, a computational approach that facilitates risk-stratification by leveraging scRNA-seq profiles and language modeling techniques. We compared SCellBOW with existing best practice methods for its ability to precisely represent phenotypically divergent cell types across multiple scRNA-seq datasets, including our in-house generated human splenocyte and matched peripheral blood mononuclear cell (PBMC) dataset. SCellBOW offers a remarkable feature for executing algebraic operations such as ’+’ and ’–’ on single-cells in the latent space while preserving the biological meanings. This feature catalyzes the simulation of the residual phenotype of tumors, following positive and negative selection of specific malignant cell subtypes in a tumor. As a proof of concept, we tested and validated phenotype algebra across three independent cancer types – glioblastoma multiforme, breast cancer and metastatic prostate cancer. In particular, we demonstrate how the negative selection of specific clones may lead to variable prognosis. From the metastatic prostate cancer scRNA-seq data, SCellBOW identifies a hitherto unknown and pervasive AR−/NElow (androgen receptor negative, neuroendocrine-low) malignant cell subpopulation with a conspicuously high predictive risk score. We could trace this back in a large-scale spatial omics atlas of 141 well-characterized metastatic prostate cancer samples at the spot resolution.

[1]  R. Ng,et al.  Predicting the Survival of Patients With Cancer From Their Initial Oncology Consultation Document Using Natural Language Processing , 2023, JAMA network open.

[2]  Alec R. Chapman,et al.  Correlated gene modules uncovered by high-precision single-cell transcriptomics , 2022, Proceedings of the National Academy of Sciences of the United States of America.

[3]  E. Castellón,et al.  Cancer Stemness/Epithelial–Mesenchymal Transition Axis Influences Metastasis and Castration Resistance in Prostate Cancer: Potential Therapeutic Target , 2022, International journal of molecular sciences.

[4]  A. Majumdar,et al.  Marker-free characterization of full-length transcriptomes of single live circulating tumor cells , 2022, Genome research.

[5]  Z. Madeja,et al.  Acquired drug resistance interferes with the susceptibility of prostate cancer cells to metabolic stress , 2022, Cellular & Molecular Biology Letters.

[6]  Michael J. Keiser,et al.  A single-cell gene expression language model , 2022, ArXiv.

[7]  A. Majumdar,et al.  Gene expression based inference of cancer drug sensitivity , 2022, Nature Communications.

[8]  C. Logothetis,et al.  Mesenchymal and stem-like prostate cancer linked to therapy-induced lineage plasticity and metastasis , 2022, Cell reports.

[9]  K. Pantel,et al.  Aggressive variants of prostate cancer: underlying mechanisms of neuroendocrine transdifferentiation , 2022, Journal of Experimental & Clinical Cancer Research.

[10]  Fabian J Theis,et al.  anndata: Annotated data , 2021, bioRxiv.

[11]  Junzhou Huang,et al.  scBERT as a Large-scale Pretrained Deep Language Model for Cell Type Annotation of Single-cell RNA-seq Data , 2021, bioRxiv.

[12]  J. Lim,et al.  Deep-Learning-Based Natural Language Processing of Serial Free-Text Radiological Reports for Predicting Rectal Cancer Patient Survival , 2021, Frontiers in Oncology.

[13]  M. Bhasin,et al.  Survival Genie: A Web Portal for Single-Cell Data, Gene-Ratio, and Cell Composition-Based Survival Analyses , 2021, Blood.

[14]  Xuansheng Wu,et al.  Rethinking the Impacts of Overfitting and Feature Quality on Small-scale Video Classification , 2021, ACM Multimedia.

[15]  P. McNicholas,et al.  Identification of five important genes to predict glioblastoma subtypes , 2021, Neuro-oncology advances.

[16]  S. Balk,et al.  Metastatic Castration-Resistant Prostate Cancer Remains Dependent on Oncogenic Drivers Found in Primary Tumors , 2021, JCO precision oncology.

[17]  Fabian J Theis,et al.  Mapping single-cell data to reference atlases by transfer learning , 2021, Nature Biotechnology.

[18]  Xiujie Chen,et al.  Precision treatment exploration of breast cancer based on heterogeneity analysis of lncRNAs at the single-cell level , 2021, BMC cancer.

[19]  Beth K. Martin,et al.  Single-cell lineage tracing of metastatic cancer reveals selection of hybrid EMT states. , 2021, Cancer cell.

[20]  Debarka Sengupta,et al.  Big data analytics in single‐cell transcriptomics: Five grand opportunities , 2021, WIREs Data Mining Knowl. Discov..

[21]  Mehdi Ghatee,et al.  A systematic review on overfitting control in shallow and deep neural networks , 2021, Artificial Intelligence Review.

[22]  P. Nelson,et al.  Inter- and intra-tumor heterogeneity of metastatic prostate cancer determined by digital spatial gene expression profiling , 2021, Nature communications.

[23]  M. Delorenzi,et al.  Tailoring the resolution of single-cell RNA sequencing for primary cytotoxic T cells , 2021, Nature Communications.

[24]  H. Beltran,et al.  Clinical and Biological Features of Neuroendocrine Prostate Cancer , 2021, Current Oncology Reports.

[25]  M. Rubin,et al.  Loss and revival of androgen receptor signaling in advanced prostate cancer , 2021, Oncogene.

[26]  Xu Zhou,et al.  Single-cell RNA-seq dissects the intratumoral heterogeneity of triple-negative breast cancer based on gene regulatory networks , 2021, Molecular therapy. Nucleic acids.

[27]  Raphael Gottardo,et al.  Integrated analysis of multimodal single-cell data , 2020, Cell.

[28]  Benjamin J. Raphael,et al.  Characterizing genetic intra-tumor heterogeneity across 2,658 human cancer genomes , 2020, Cell.

[29]  A. Regev,et al.  Transcriptional mediators of treatment resistance in lethal prostate cancer , 2020, Nature Medicine.

[30]  N. Aghaeepour,et al.  Single-cell peripheral immunoprofiling of Alzheimer’s and Parkinson’s diseases , 2020, Science Advances.

[31]  X. Liu,et al.  Stromal cell diversity associated with immune evasion in human triple‐negative breast cancer , 2020, The EMBO journal.

[32]  Kevin Petrecca,et al.  Single-cell RNA-seq reveals that glioblastoma recapitulates a normal neurodevelopmental hierarchy , 2020, Nature Communications.

[33]  D. Pe’er,et al.  Regenerative potential of prostate luminal cells revealed by single-cell analysis , 2020, Science.

[34]  J. Poschmann,et al.  Characterization of Rat ILCs Reveals ILC2 as the Dominant Intestinal Subset , 2020, Frontiers in Immunology.

[35]  Jian Hu,et al.  Iterative transfer learning with neural network for clustering and cell type classification in single-cell RNA-seq analysis , 2020, Nature Machine Intelligence.

[36]  Jihwan Park,et al.  Deep learning enables accurate clustering and batch effect removal in single-cell RNA-seq analysis , 2019, bioRxiv.

[37]  Sebastian Pölsterl,et al.  scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn , 2020, J. Mach. Learn. Res..

[38]  Klaus-Robert Müller,et al.  Using transfer learning from prior reference knowledge to improve the clustering of single-cell RNA-Seq data , 2019, Scientific Reports.

[39]  Huating Yuan,et al.  Discovering Rare Genes Contributing to Cancer Stemness and Invasive Potential by GBM Single-Cell Transcriptional Analysis , 2019, Cancers.

[40]  A. Dimberg,et al.  Tumor angiogenesis: causes, consequences, challenges and opportunities , 2019, Cellular and Molecular Life Sciences.

[41]  E. Antonarakis Targeting lineage plasticity in prostate cancer. , 2019, The Lancet. Oncology.

[42]  P. Nelson,et al.  Molecular profiling stratifies diverse phenotypes of treatment-refractory metastatic castration-resistant prostate cancer. , 2019, The Journal of clinical investigation.

[43]  Mariella G. Filbin,et al.  An Integrative Model of Cellular States, Plasticity, and Genetics for Glioblastoma , 2019, Cell.

[44]  M. Loda,et al.  The Role of Lineage Plasticity in Prostate Cancer Therapy Resistance , 2019, Clinical Cancer Research.

[45]  Olga Kononova,et al.  Unsupervised word embeddings capture latent knowledge from materials science literature , 2019, Nature.

[46]  Yi Mi Wu,et al.  Genomic correlates of clinical outcome in advanced prostate cancer , 2019, Proceedings of the National Academy of Sciences.

[47]  G. Finocchiaro,et al.  The landscape of the mesenchymal signature in brain tumours , 2019, Brain : a journal of neurology.

[48]  Piyush B. Gupta,et al.  Phenotypic Plasticity: Driver of Cancer Initiation, Progression, and Therapy Resistance. , 2019, Cell stem cell.

[49]  J. Deasy,et al.  Robust and interpretable PAM50 reclassification exhibits survival advantage for myoepithelial and immune phenotypes , 2018, npj Breast Cancer.

[50]  Vincent A. Traag,et al.  From Louvain to Leiden: guaranteeing well-connected communities , 2018, Scientific Reports.

[51]  Hemant Ishwaran,et al.  Random Survival Forests , 2008, Wiley StatsRef: Statistics Reference Online.

[52]  Bertrand Z. Yeung,et al.  Cell Hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics , 2018, Genome Biology.

[53]  Z. Werb,et al.  Tumour heterogeneity and metastasis at single-cell resolution , 2018, Nature Cell Biology.

[54]  Venkat S. Malladi,et al.  A Cellular Anatomy of the Normal Adult Human Prostate and Prostatic Urethra , 2018, bioRxiv.

[55]  Shawn M. Gillespie,et al.  Unravelling subclonal heterogeneity and aggressive disease states in TNBC through single-cell RNA-seq , 2018, Nature Communications.

[56]  Carlo Colantuoni,et al.  Decomposing cell identity for transfer learning across cellular measurements, platforms, tissues, and species , 2018, bioRxiv.

[57]  Paul Hoffman,et al.  Integrating single-cell transcriptomic data across different conditions, technologies, and species , 2018, Nature Biotechnology.

[58]  A. Shaw,et al.  Tumour heterogeneity and resistance to cancer therapies , 2018, Nature Reviews Clinical Oncology.

[59]  Fabian J Theis,et al.  SCANPY: large-scale single-cell gene expression data analysis , 2018, Genome Biology.

[60]  James Hicks,et al.  Unravelling biology and shifting paradigms in cancer with single-cell sequencing , 2017, Nature Reviews Cancer.

[61]  H. Swerdlow,et al.  Large-scale simultaneous measurement of epitopes and transcriptomes in single cells , 2017, Nature Methods.

[62]  Stuart L. Schreiber,et al.  Drug-tolerant persister cancer cells are vulnerable to GPX4 inhibition , 2017, Nature.

[63]  Leming Shi,et al.  Advances in single-cell RNA sequencing and its applications in cancer research , 2017, Oncotarget.

[64]  Tao Qing,et al.  Advances in single-cell RNA sequencing and its applications in cancer research , 2017, Oncotarget.

[65]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[66]  Grace X. Y. Zheng,et al.  Massively parallel digital transcriptional profiling of single cells , 2016, Nature Communications.

[67]  Samuel L. Wolock,et al.  A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure. , 2016, Cell systems.

[68]  Mauro J. Muraro,et al.  A Single-Cell Transcriptome Atlas of the Human Pancreas , 2016, Cell systems.

[69]  D. M. Smith,et al.  Single-Cell Transcriptome Profiling of Human Pancreatic Islets in Health and Type 2 Diabetes , 2016, Cell metabolism.

[70]  H. Kauczor,et al.  Prognosis of breast cancer molecular subtypes in routine clinical care: A large prospective cohort study , 2016, BMC Cancer.

[71]  J. Schug,et al.  Single-Cell Transcriptomics of the Human Endocrine Pancreas , 2016, Diabetes.

[72]  Matteo Benelli,et al.  Divergent clonal evolution of castration resistant neuroendocrine prostate cancer , 2016, Nature Medicine.

[73]  H. G. van der Poel,et al.  Androgen receptor profiling predicts prostate cancer outcome , 2015, EMBO molecular medicine.

[74]  S. Jung,et al.  Differences in Clinical Outcomes between Luminal A and B Type Breast Cancers according to the St. Gallen Consensus 2013 , 2015, Journal of breast cancer.

[75]  Peng Jun Huang,et al.  Classification of Imbalanced Data Using Synthetic Over-Sampling Techniques , 2015 .

[76]  J. Mesirov,et al.  The Molecular Signatures Database (MSigDB) hallmark gene set collection. , 2015, Cell systems.

[77]  Jiao Zhang,et al.  Neuroendocrine Prostate Cancer (NEPC) progressing from conventional prostatic adenocarcinoma: factors associated with time to development of NEPC and survival from NEPC diagnosis-a systematic review and pooled analysis. , 2014, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[78]  O. Yersal,et al.  Biological subtypes of breast cancer: Prognostic and therapeutic implications. , 2014, World journal of clinical oncology.

[79]  Shawn M. Gillespie,et al.  Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma , 2014, Science.

[80]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[81]  Y. You,et al.  Prevalence and Clinicopathologic Characteristics of the Molecular Subtypes in Malignant Glioma: A Multi-Institutional Analysis of 941 Cases , 2014, PloS one.

[82]  Xiang-Sun Zhang,et al.  Breast tumor subgroups reveal diverse clinical prognostic power , 2014, Scientific Reports.

[83]  K. Knudsen,et al.  AR function in promoting metastatic prostate cancer , 2014, Cancer and Metastasis Reviews.

[84]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[85]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[86]  Justin Guinney,et al.  GSVA: gene set variation analysis for microarray and RNA-Seq data , 2013, BMC Bioinformatics.

[87]  D. Tindall,et al.  Androgen receptor signaling in prostate cancer development and progression , 2011, Journal of carcinogenesis.

[88]  M. Beckmann,et al.  Invasive Breast Cancer: Recognition of Molecular Subtypes , 2011, Breast Care.

[89]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[90]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[91]  A. Ashworth,et al.  Breast cancer molecular profiling with single sample predictors: a retrospective analysis. , 2010, The Lancet. Oncology.

[92]  A. Nobel,et al.  Supervised risk predictor of breast cancer based on intrinsic subtypes. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.