Identification of Significantly Expressed Gene Mutations for Automated Classification of Benign and Malignant Prostate Cancer

Among males, prostate cancer (Pca) is the cancer type with the highest prevalence and the second leading cause of cancer deaths. The current screening methods for prostate cancer lack effectiveness such as prostate-specific antigen (PSA) and digital rectal exam (DRE). Machine learning models have been used to predict Pca progression, Gleason score, and laterality. In this research paper, we have employed novel Machine learning techniques such as Bayesian approach, Support vector machines (SVM), Decision Trees, Logistic Regression, K-Nearest Neighbors, Random Forest and AdaBoost for detecting malignant prostate cancers from benign ones. Moreover, different feature extracting strategies are proposed to improve the detection performance and identify potential genomic biomarkers. The results show the Lasso feature set yielded high performance from the models with SVM achieving exemplary classification accuracy of 97%. The Lasso and SVM combination reported many significant biomarker genes and gene mutations including but not restricted to CA2320112, CA2328529, and CA2436168.

[1]  M. Bittner,et al.  Human prostate cancer and benign prostatic hyperplasia: molecular dissection by gene expression profiling. , 2001, Cancer research.

[2]  Saeid Belkasim,et al.  Wavelet transform-based feature extraction approach for epileptic seizure classification , 2021, ACM Southeast Conference.

[3]  Vanesa Segovia Bucheli,et al.  A comparative study of machine learning and deep learning algorithms to classify cancer types based on microarray gene expression data , 2020, PeerJ Comput. Sci..

[4]  Robert J. Gillies,et al.  Predicting Outcomes of Nonsmall Cell Lung Cancer Using CT Image Features , 2014, IEEE Access.

[5]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[6]  H. Handels,et al.  Extra Tree forests for sub-acute ischemic stroke lesion segmentation in MR sequences , 2015, Journal of Neuroscience Methods.

[7]  Okyaz Eminaga,et al.  Combination possibility and deep learning model as clinical decision-aided approach for prostate cancer , 2020, Health Informatics J..

[8]  Samuel Kaski,et al.  Block HSIC Lasso: model-free biomarker detection for ultra-high dimensional data , 2019, bioRxiv.

[9]  Vahid Mirjalili,et al.  Python machine learning : machine learning and deep learning with Python, scikit-learn, and TensorFlow , 2017 .

[10]  V. Goh,et al.  Non-invasive classification of non-small cell lung cancer: a comparison between random forest models utilising radiomic and semantic features , 2019, The British journal of radiology.

[11]  L. Rueda,et al.  Prediction of tumor location in prostate cancer tissue using a machine learning system on gene expression data , 2020, BMC Bioinformatics.

[12]  N. Palanisamy,et al.  A Hierarchical Machine Learning Model to Discover Gleason Grade-Specific Biomarkers in Prostate Cancer , 2019, Diagnostics.

[13]  Masoom A. Haider,et al.  Prostate Cancer Detection using Deep Convolutional Neural Networks , 2019, Scientific Reports.

[14]  T. Hoque,et al.  Classification of Prostate Cancer Patients into Indolent and Aggressive Using Machine Learning , 2020 .

[15]  Sijian Wang,et al.  RANDOM LASSO. , 2011, The annals of applied statistics.

[16]  Hiroshi Motoda,et al.  Feature Selection Extraction and Construction , 2002 .

[17]  Mansoor Alam,et al.  A Machine Learning Classification Technique for Predicting Prostate Cancer , 2020, 2020 IEEE International Conference on Electro Information Technology (EIT).

[18]  Saeid Belkasim,et al.  Epileptic seizures classification in EEG using PCA based genetic algorithm through machine learning , 2021, ACM Southeast Conference.

[19]  Ajay Joshi,et al.  Optimisation of cancer classification by machine learning generates an enriched list of candidate drug targets and biomarkers. , 2020, Molecular omics.

[20]  R. Eeles,et al.  DESNT: A Poor Prognosis Category of Human Prostate Cancer. , 2017, European urology focus.

[21]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[22]  F. Zhang,et al.  Histologic subtype classification of non-small cell lung cancer using PET/CT images , 2020, European Journal of Nuclear Medicine and Molecular Imaging.

[23]  L. Porter,et al.  Transcriptomics Signature from Next-Generation Sequencing Data Reveals New Transcriptomic Biomarkers Related to Prostate Cancer , 2019, Cancer informatics.

[24]  Md Khurram Monir Rabby,et al.  Histological classification of non-small cell lung cancer with RNA-seq data using machine learning models , 2021, Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics.

[25]  M. Becich,et al.  Gene expression alterations in prostate cancer predicting tumor aggression and preceding development of malignancy. , 2004, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.