Deep Learning to Discover Cancer Glycome Genes Signifying the Origins of Cancer

Background: Aberrant protein glycosylation is a common feature of cancer and contributes to malignant behavior. However, how and to what extent the cellular glycome is involved in cancer development and progression is still undefined. The primary objective of this study is to conduct insilico identification of glycome genes that could reveal a signature of cancer using expression profiles of cancer genomes. There exists a list of $\sim 500$ glycome genes in several molecular categories. This study is based on the hypothesis that if the glycosylation is a common feature of cancer, there exists a shortlist of cancer glycome genes and their expression profiles should carry the signature capable of differentiating 33 different cancers available in The Cancer Genome Atlas (TCGA).Method: The distribution of cancer samples in TCGA is highly imbalanced, ranging from 36 for Cholangiocarcinoma (CHOL) to 1089 for Breast Cancer (BRCA). Supervised feature selection approaches to identify the signature genes would be biased to larger groups. We developed a computational framework using concrete autoencoder (CAE), a deep learning-based unsupervised feature selection algorithm, to find the cancer-related glycome genes. The criteria of optimal feature subset used in this study are (a) the number of features should be as few as possible, and (b) accuracy of classification using the selected features should be >90%.Results: Our experiment showed a shortlist of glycome genes (132 genes) that can differentiate 33 different cancers with an accuracy of 92%. This study reflects that the cancer glycome genes signify the origins of cancer.

[1]  Zi Huang,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence ℓ2,1-Norm Regularized Discriminative Feature Selection for Unsupervised Learning , 2022 .

[2]  C. Dimitroff Galectin-Binding O-Glycosylations as Regulators of Malignancy. , 2015, Cancer research.

[3]  A. Al Mamun,et al.  Feature Selection and Classification Reveal Key lncRNAs for Multiple Cancers , 2019, 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[4]  Shuli Kang,et al.  Loss of GCNT2/I-branched glycans enhances melanoma growth and survival , 2018, Nature Communications.

[5]  Chao Xu,et al.  Autoencoder Inspired Unsupervised Feature Selection , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[7]  A. Al Mamun,et al.  Long Non-coding RNA Based Cancer Classification using Deep Neural Networks , 2019, Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics.

[8]  Amarnath Sharma,et al.  Cancer metastasis: a search for therapeutic inhibition. , 1998, Cancer investigation.

[9]  James Zou,et al.  Concrete Autoencoders for Differentiable Feature Selection and Reconstruction , 2019, ArXiv.

[10]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[11]  Lin Sun,et al.  A Hybrid Gene Selection Method Based on ReliefF and Ant Colony Optimization Algorithm for Tumor Classification , 2019, Scientific Reports.

[12]  S. Barthel,et al.  Targeting selectins and selectin ligands in inflammation and cancer , 2007, Expert opinion on therapeutic targets.

[13]  Jamshid Pirgazi,et al.  An Efficient hybrid filter-wrapper metaheuristic-based gene selection method for high dimensional datasets , 2019, Scientific Reports.

[14]  Ron Kohavi,et al.  Feature Selection for Knowledge Discovery and Data Mining , 1998 .

[15]  Deng Cai,et al.  Unsupervised feature selection for multi-cluster data , 2010, KDD.