Cancer Gene Profiling through Unsupervised Discovery

Precision medicine is a paradigm shift in healthcare relying heavily on genomics data. However, the complexity of biological interactions, the large number of genes as well as the lack of comparisons on the analysis of data, remain a tremendous bottleneck regarding clinical adoption. In this paper, we introduce a novel, automatic and unsupervised framework to discover low-dimensional gene biomarkers. Our method is based on the LP-Stability algorithm, a high dimensional centerbased unsupervised clustering algorithm, that offers modularity as concerns metric functions and scalability, while being able to automatically determine the best number of clusters. Our evaluation includes both mathematical and biological criteria. The recovered signature is applied to a variety of biological tasks, including screening of biological pathways and functions, and characterization relevance on tumor types and subtypes. Quantitative comparisons among different distance metrics, commonly used clustering methods and a referential gene signature used in the literature, confirm state of the art performance of our approach. In particular, our signature, that is based on 27 genes, reports at least 30 times better mathematical significance (average Dunn’s Index) and 25% better biological significance (average Enrichment in Protein-Protein Interaction) than those produced by other referential clustering methods. Finally, our signature reports promising results on distinguishing immune inflammatory and immune desert tumors, while reporting a high balanced accuracy of 92% on tumor types classification and averaged balanced accuracy of 68% on tumor subtypes classification, which represents, respectively 7% and 9% higher performance compared to the referential signature.

[1]  Silvana Pilotti,et al.  UQCRH gene encoding mitochondrial Hinge protein is interrupted by a translocation in a soft-tissue sarcoma and epigenetically inactivated in some cancer cell lines , 2003, Oncogene.

[2]  Michalis Vazirgiannis,et al.  c ○ 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. On Clustering Validation Techniques , 2022 .

[3]  E. Eyras,et al.  The role of alternative splicing in cancer , 2016, Transcription.

[4]  Shirley Pepke,et al.  Comprehensive discovery of subsample gene expression components by information explanation: therapeutic implications in cancer , 2016, BMC Medical Genomics.

[5]  E. Drucker,et al.  Pitfalls and limitations in translation from biomarker discovery to clinical utility in predictive and personalised medicine , 2013, EPMA Journal.

[6]  Steven J. M. Jones,et al.  Comprehensive Characterization of Cancer Driver Genes and Mutations , 2018, Cell.

[7]  A. Lusis,et al.  Considerations for the design of omics studies , 2017 .

[8]  S. Cross,et al.  Association of breast cancer risk with genetic variants showing differential allelic expression: Identification of a novel breast cancer susceptibility locus at 4q21 , 2016, Oncotarget.

[9]  Gamal Attiya,et al.  Classification of human cancer diseases by gene expression profiles , 2017, Appl. Soft Comput..

[10]  Allison P. Heath,et al.  Toward a Shared Vision for Cancer Genomic Data. , 2016, The New England journal of medicine.

[11]  A. Alshabi,et al.  Identification of Crucial Candidate Genes and Pathways in Glioblastoma Multiform by Bioinformatics Analysis , 2019, Biomolecules.

[12]  Steven J. M. Jones,et al.  The Immune Landscape of Cancer , 2018, Immunity.

[13]  Yuya Kobayashi,et al.  Clinical evaluation of a multiple-gene sequencing panel for hereditary cancer risk assessment. , 2014, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[14]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[15]  Susan Richman,et al.  Cancer-cell intrinsic gene expression signatures overcome intratumoural heterogeneity bias in colorectal cancer patient classification , 2017, Nature Communications.

[16]  Nikos Komodakis,et al.  Clustering via LP-based Stabilities , 2008, NIPS.

[17]  Sylvia E. Le Dévédec,et al.  Splicing regulatory factors in breast cancer hallmarks and disease progression , 2019, Oncotarget.

[18]  Nikos Paragios,et al.  AI-Driven CT-based quantification, staging and short-term outcome prediction of COVID-19 pneumonia , 2020, medRxiv.

[19]  E. Jordanova,et al.  Prognostic effect of different PD-L1 expression patterns in squamous cell carcinoma and adenocarcinoma of the cervix , 2016, Modern Pathology.

[20]  M. Dugo,et al.  Metabolic Footprints and Molecular Subtypes in Breast Cancer , 2017, Disease markers.

[21]  N. Paragios,et al.  A radiomics approach to assess tumour-infiltrating CD8 cells and response to anti-PD-1 or anti-PD-L1 immunotherapy: an imaging biomarker, retrospective multicohort study. , 2018, The Lancet. Oncology.

[22]  Nils Blüthgen,et al.  Classification of gene signatures for their information value and functional redundancy , 2017, npj Systems Biology and Applications.

[23]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Nikos Paragios,et al.  Gene Expression High-Dimensional Clustering Towards a Novel, Robust, Clinically Relevant and Highly Compact Cancer Signature , 2019, IWBBIO.

[25]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[26]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[27]  V. Castranova,et al.  A breast cancer prognostic signature predicts clinical outcomes in multiple tumor types. , 2010, Oncology reports.

[28]  D. Hanahan,et al.  Hallmarks of Cancer: The Next Generation , 2011, Cell.

[29]  Aram Galstyan,et al.  Discovering Structure in High-Dimensional Data Through Correlation Explanation , 2014, NIPS.

[30]  F. Wagner GO-PCA: An Unsupervised Method to Explore Gene Expression Data Using Prior Knowledge , 2015, PloS one.

[31]  Mark Levene,et al.  Estimating the number of clusters using diversity , 2017, Artif. Intell. Res..

[32]  Colin N. Dewey,et al.  RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome , 2011, BMC Bioinformatics.

[33]  C. Bohr,et al.  SFTA3, a novel protein of the lung: three-dimensional structure, characterisation and immune activation , 2014, European Respiratory Journal.

[34]  R. Ye,et al.  The Expression of Formyl Peptide Receptor 1 is Correlated with Tumor Invasion of Human Colorectal Cancer , 2017, Scientific Reports.

[35]  Xi Chen,et al.  Eight potential biomarkers for distinguishing between lung adenocarcinoma and squamous cell carcinoma , 2017, Oncotarget.

[36]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[37]  P. Gann,et al.  BRCA1 protein expression and subcellular localization in primary breast cancer: Automated digital microscopy analysis of tissue microarrays , 2017, PloS one.

[38]  Benjamin J. Raphael,et al.  Network propagation: a universal amplifier of genetic associations , 2017, Nature Reviews Genetics.

[39]  M. Cugmas,et al.  On comparing partitions , 2015 .

[40]  Csaba Legány,et al.  Cluster validity measurement techniques , 2006 .

[41]  J. Long,et al.  Whole-Exome Sequencing Identifies Novel Somatic Mutations in Chinese Breast Cancer Patients , 2015, Journal of molecular and genetic medicine : an international journal of biomedical research.

[42]  Ezekiel Adebiyi,et al.  Clustering Algorithms: Their Application to Gene Expression Data , 2016, Bioinformatics and biology insights.

[43]  R. Natrajan,et al.  Splicing dysregulation as a driver of breast cancer , 2018, Endocrine-related cancer.

[44]  R. Vierkant,et al.  Inherited Variants in Mitochondrial Biogenesis Genes May Influence Epithelial Ovarian Cancer Risk , 2011, Cancer Epidemiology, Biomarkers & Prevention.

[45]  João Pedro de Magalhães,et al.  Gene co-expression analysis for functional classification and gene–disease predictions , 2017, Briefings Bioinform..

[46]  J. Lunceford,et al.  IFN- γ –related mRNA profile predicts clinical response to PD-1 blockade , 2017 .

[47]  Michael Ittmann,et al.  Pan-urologic cancer genomic subtypes that transcend tissue of origin , 2017, Nature Communications.

[48]  Gerhard Reinelt,et al.  Analyzing the regulation of metabolic pathways in human breast cancer , 2009, BMC Medical Genomics.