Constructing a molecular subtype model of colon cancer using machine learning

Background: Colon cancer (CRC) is one of the malignant tumors with a high incidence in the world. Many previous studies on CRC have focused on clinical research. With the in-depth study of CRC, the role of molecular mechanisms in CRC has become increasingly important. Currently, machine learning is widely used in medicine. By combining machine learning with molecular mechanisms, we can better understand CRC’s pathogenesis and develop new treatments for it. Methods and materials: We used the R language to construct molecular subtypes of colon cancer and subsequently explored prognostic genes with GEPIA2. Enrichment analysis is used by WebGestalt to obtain differential genes. Protein–protein interaction networks of differential genes were constructed using the STRING database and the Cytoscape tool. TIMER2.0 and TISIDB databases were used to investigate the correlation of these genes with immune-infiltrating cells and immune targets. The cBioportal database was used to explore genomic alterations. Results: In our study, the molecular prognostic model of CRC was constructed to study the prognostic factors of CRC, and finally, it was found that Charcot–Leyden crystal galectin (CLC), zymogen granule protein 16 (ZG16), leucine-rich repeat-containing protein 26 (LRRC26), intelectin 1 (ITLN1), UDP-GlcNAc: betaGal beta-1,3-N-acetylglucosaminyltransferase 6 (B3GNT6), chloride channel accessory 1 (CLCA1), growth factor independent 1 transcriptional repressor (GFI1), aquaporin 8 (AQP8), HEPACAM family member 2 (HEPACAM2), and UDP glucuronosyltransferase family 2 member B15 (UGT2B15) were correlated with the subtype model of CRC prognosis. Enrichment analysis shows that differential genes were mainly associated with immune-inflammatory pathways. GFI1 and CLC were associated with immune cells, immunoinhibitors, and immunostimulator. Genomic analysis shows that there were no significant changes in differential genes. Conclusion: By constructing molecular subtypes of colon cancer, we discovered new colon cancer prognostic markers, which can provide direction for new treatments in the future.

[1]  Wanqing Chen,et al.  Cancer statistics in China and United States, 2022: profiles, trends, and determinants , 2022, Chinese medical journal.

[2]  A. Jemal,et al.  Cancer statistics, 2022 , 2022, CA: a cancer journal for clinicians.

[3]  D. Bidwell,et al.  Formation , 2006, Revue Francophone d'Orthoptie.

[4]  J. Marescaux,et al.  Automatic Recognition of Colon and Esophagogastric Cancer with Machine Learning and Hyperspectral Imaging , 2021, Diagnostics.

[5]  Mohamed-Slim Alouini,et al.  ECG-based machine-learning algorithms for heartbeat classification , 2021, Scientific Reports.

[6]  Shaun M. Kandathil,et al.  A guide to machine learning for biologists , 2021, Nature Reviews Molecular Cell Biology.

[7]  F. Greten,et al.  The inflammatory pathogenesis of colorectal cancer , 2021, Nature Reviews Immunology.

[8]  Mehedi Masud,et al.  A Machine Learning Approach to Diagnosing Lung and Colon Cancer Using a Deep Learning-Based Classification Framework , 2021, Sensors.

[9]  Nadezhda T. Doncheva,et al.  The STRING database in 2021: customizable protein–protein networks, and functional characterization of user-uploaded gene/measurement sets , 2020, Nucleic Acids Res..

[10]  Jussi Tohka,et al.  Evaluation of machine learning algorithms for Health and Wellness applications: a tutorial , 2020, Comput. Biol. Medicine.

[11]  Yunlong Liu,et al.  Transcription factor expression as a predictor of colon cancer prognosis: a machine learning practice , 2020, BMC Medical Genomics.

[12]  Ziqian Wu,et al.  A machine learning-based prognostic predictor for stage III colon cancer , 2020, Scientific Reports.

[13]  Xiaole Shirley Liu,et al.  TIMER2.0 for analysis of tumor-infiltrating immune cells , 2020, Nucleic Acids Res..

[14]  James D. Kang,et al.  Development of machine learning and natural language processing algorithms for preoperative prediction and automated identification of intraoperative vascular injury in anterior lumbar spine surgery. , 2020, The spine journal : official journal of the North American Spine Society.

[15]  Beibei Ru,et al.  TISIDB: an integrated repository portal for tumor-immune system interactions , 2019, Bioinform..

[16]  Zachary J. Heins,et al.  Integration and Analysis of CPTAC Proteomics Data in the Context of Cancer Genomics in the cBioPortal. , 2019, Molecular & cellular proteomics : MCP.

[17]  Ahmedin Jemal,et al.  Cancer treatment and survivorship statistics, 2019 , 2019, CA: a cancer journal for clinicians.

[18]  Jing Wang,et al.  WebGestalt 2019: gene set analysis toolkit with revamped UIs and APIs , 2019, Nucleic Acids Res..

[19]  Zemin Zhang,et al.  GEPIA2: an enhanced web server for large-scale expression profiling and interactive analysis , 2019, Nucleic Acids Res..

[20]  James E. Muller,et al.  Integration and Analysis of CPTAC Proteomics Data in the Context of Cancer Genomics in the cBioPortal* , 2018, Molecular & Cellular Proteomics.

[21]  E. Latz,et al.  Charcot–Leyden Crystals Activate the NLRP3 Inflammasome and Cause IL-1β Inflammation in Human Macrophages , 2018, The Journal of Immunology.

[22]  Jiyong Su A Brief History of Charcot-Leyden Crystal Protein/Galectin-10 Research , 2018, Molecules.

[23]  T. Möröy,et al.  From cytopenia to leukemia: the role of Gfi1 and Gfi1b in blood formation. , 2015, Blood.

[24]  A. Jemal,et al.  Cancer treatment and survivorship statistics, 2012 , 2012, CA: a cancer journal for clinicians.

[25]  Matthew D. Wilkerson,et al.  ConsensusClusterPlus: a class discovery tool with confidence assessments and item tracking , 2010, Bioinform..