A novel feature reduction method to improve performance of machine learning model

Developing radiomic based machine learning models has drawn considerable attention in recent years. However, identifying a small and optimal feature vector to build a robust machine learning models has always been a controversial issue. In this study, we investigated the feasibility of applying a random projection algorithm to create an optimal feature vector from the CAD-generated large feature pool and improve the performance of the machine learning model. We assemble a retrospective dataset involving abdominal computed tomography (CT) images acquired from 188 patients diagnosed with gastric cancer. Among them, 141 cases have peritoneal metastasis (PM), while 47 cases do not have PM. A computer-aided detection (CAD) scheme is applied to segment the gastric tumor area and computes 325 image features. Then, two Logistic Regression models embedded with two different feature dimensionality reduction methods, namely, the principal component analysis (PCA) and a random projection algorithm (RPA). Afterward, a synthetic minority oversampling technique (SMOTE) is used to balance the dataset. The proposed ML model is built to predict the risk of the patients having advanced gastric cancer (AGC). All Logistic Regression models are trained and tested using a leave-one-case-out cross-validation method. Results show that the logistic regression embedded with RPA yielded a significantly higher AUC value (0.69±0.025) than using PCA (0.62±0.014) (p<0.05). The study demonstrated that CT images of the gastric tumors contain discriminatory information to predict the risk of PM in AGC patients, and RPA is a promising method to generate optimal feature vector, improving the performance of ML models of medical images.

[1]  Heikki Mannila,et al.  Random projection in dimensionality reduction: applications to image and text data , 2001, KDD '01.

[2]  Bin Zheng,et al.  Developing global image feature analysis models to predict cancer risk and prognosis , 2019, Visual Computing for Industry, Biomedicine, and Art.

[3]  Rinku Sutradhar,et al.  How useful is preoperative imaging for tumor, node, metastasis (TNM) staging of gastric cancer? A meta-analysis , 2012, Gastric Cancer.

[4]  Bin Zheng,et al.  A new case-based CAD scheme using a hierarchical SSIM feature extraction method to classify between malignant and benign cases , 2020, Medical Imaging.

[5]  A. Jemal,et al.  Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries , 2018, CA: a cancer journal for clinicians.

[6]  Bin Zheng,et al.  Developing a Quantitative Ultrasound Image Feature Analysis Scheme to Assess Tumor Treatment Efficacy Using a Mouse Model , 2019, Scientific Reports.

[7]  Bin Zheng,et al.  Exploring a new quantitative image marker to assess benefit of chemotherapy to ovarian cancer patients , 2017, Medical Imaging.

[8]  Shuji Takiguchi,et al.  Neoadjuvant Intraperitoneal and Systemic Chemotherapy for Gastric Cancer Patients with Peritoneal Dissemination , 2011, Annals of Surgical Oncology.

[9]  Krzysztof J. Geras,et al.  New Frontiers: An Update on Computer-Aided Diagnosis for Breast Imaging in the Age of Artificial Intelligence. , 2019, AJR. American journal of roentgenology.

[10]  Márcio Eduardo Delamaro,et al.  A systematic review on the evaluation and characteristics of computer-aided diagnosis systems , 2014 .

[11]  Wei Qian,et al.  Association of computer-aided detection results and breast cancer risk , 2019, Medical Imaging.

[12]  Francisco Herrera,et al.  SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary , 2018, J. Artif. Intell. Res..

[13]  Riccardo Ricci,et al.  Neo-adjuvant chemo(radio)therapy in gastric cancer: Current status and future perspectives. , 2015, World journal of gastrointestinal oncology.

[14]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[15]  Takuhiro Yamaguchi,et al.  Phase III Trial Comparing Intraperitoneal and Intravenous Paclitaxel Plus S-1 Versus Cisplatin Plus S-1 in Patients With Gastric Cancer With Peritoneal Metastasis: PHOENIX-GC Trial. , 2018, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[16]  Bin Zheng,et al.  Developing a new quantitative imaging marker to predict pathological complete response to neoadjuvant chemotherapy , 2019, Medical Imaging.

[17]  Ali Zarafshani,et al.  Applying a CAD-generated imaging marker to assess short-term breast cancer risk , 2018, Medical Imaging.

[18]  Morteza Heidari,et al.  Applying a random projection algorithm to optimize machine learning model for predicting peritoneal metastasis in gastric cancer patients using CT images , 2020, Comput. Methods Programs Biomed..

[19]  Lei Tang,et al.  CT textural analysis of gastric cancer: correlations with immunohistochemical biomarkers , 2018, Scientific Reports.

[20]  Galina Kurteva,et al.  Capecitabine and cisplatin with or without cetuximab for patients with previously untreated advanced gastric cancer (EXPAND): a randomised, open-label phase 3 trial. , 2013, The Lancet. Oncology.

[21]  Wei Qian,et al.  A hybrid deep learning approach to predict malignancy of breast lesions using mammograms , 2018, Medical Imaging.

[22]  Andre Dekker,et al.  Radiomics: the process and the challenges. , 2012, Magnetic resonance imaging.

[23]  E. Cotte,et al.  Intraperitoneal chemotherapy in advanced gastric cancer. Meta-analysis of randomized trials. , 2013, European journal of surgical oncology : the journal of the European Society of Surgical Oncology and the British Association of Surgical Oncology.

[24]  Morteza Heidari,et al.  Applying a new computer-aided detection scheme generated imaging marker to predict short-term breast cancer risk , 2018, Physics in medicine and biology.

[25]  Lei Tang,et al.  The Chinese Society of Clinical Oncology (CSCO): clinical guidelines for the diagnosis and treatment of gastric cancer , 2019, Cancer communications.

[26]  Bin Zheng,et al.  Association between background parenchymal enhancement of breast MRI and BIRADS rating change in the subsequent screening , 2018, Medical Imaging.

[27]  Yutaka Kimura,et al.  A prospective multi-institutional validity study to evaluate the accuracy of clinical diagnosis of pathological stage III gastric cancer (JCOG1302A) , 2017, Gastric Cancer.

[28]  Patrick Granton,et al.  Radiomics: extracting more information from medical images using advanced feature analysis. , 2012, European journal of cancer.

[29]  Morteza Heidari,et al.  Improving the performance of CNN to predict the likelihood of COVID-19 using chest X-ray images with preprocessing algorithms , 2020, International Journal of Medical Informatics.