Dual feature selection and rebalancing strategy using metaheuristic optimization algorithms in X-ray image datasets

The imbalance and multi-dimension are two common problems in the medical image datasets, which affect the performances of the image processing procedures. The traditional methods to solve these two problems are notoriously difficult. Accordingly, this work employed metaheuristic methods to optimize the rebalancing process of the imbalanced class distribution for further use in the feature selection procedure for dimensionality reduction for the medical X-ray image datasets. Different metaheuristic algorithms were used to maximize the parameter values of the rebalancing and feature selection phases to preprocess the datasets. The proposed work devised a multi-objective optimization strategy in the process of the metaheuristic algorithms search to solve the problem of dual imbalanced dataset and feature selection. Afterward, a comparative study of the proposed optimized approach with the conventional methods was conducted to evaluate the proposed method performance. The results established the superiority of the proposed method to overcome the imbalanced and multi-dimensional problem. The proposed method generated a reasonable number of minority class samples and selected a sensible subset of features to ultimately obtain a very extraordinary accuracy with great credibility from a negative value of kappa and a false high accuracy. It produced higher credibility and correctness classification performance in the practical problem of medical X-ray images compared to other algorithms. Feature selection with Random-SMOTE (RSMOTE) using the self-adaptive Bat algorithm is superior to the optimization using particle swarm optimization. The proposed method using the Bat algorithm achieved 94.6% classification accuracy with 0.883 Kappa value using the lung X-ray first dataset.

[1]  Yiteng Pan,et al.  A novel region-based active contour model via local patch similarity measure for image segmentation , 2018, Multimedia Tools and Applications.

[2]  Y. Rui,et al.  Learning to Rank Using User Clicks and Visual Features for Image Retrieval , 2015, IEEE Transactions on Cybernetics.

[3]  Bram van Ginneken,et al.  Segmentation of anatomical structures in chest radiographs using supervised methods: a comparative study on a public database , 2006, Medical Image Anal..

[4]  N. Dey,et al.  Quantum-inspired evolutionary algorithm for scaling factor optimization during manifold medical information embedding , 2017 .

[5]  Nilanjan Dey,et al.  Nonparametric de‐noising filter optimization using structure‐based microscopic image classification , 2017, Microscopy research and technique.

[6]  Yi Zhou,et al.  Parallel ant colony optimization on multi-core SIMD CPUs , 2018, Future Gener. Comput. Syst..

[7]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[8]  Jinyan Li,et al.  Optimizing SMOTE by Metaheuristics with Neural Network and Decision Tree , 2015, 2015 3rd International Symposium on Computational and Business Intelligence (ISCBI).

[9]  Gerald Schaefer,et al.  A cost-sensitive ensemble classifier for breast cancer classification , 2013, 2013 IEEE 8th International Symposium on Applied Computational Intelligence and Informatics (SACI).

[10]  B. Krawczyk,et al.  Ensemble fusion methods for medical data classification , 2012, 11th Symposium on Neural Network Applications in Electrical Engineering.

[11]  Edward Y. Chang,et al.  Aligning boundary in kernel space for learning imbalanced dataset , 2004, Fourth IEEE International Conference on Data Mining (ICDM'04).

[12]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[13]  Yi Zhou,et al.  Dynamic strategy based parallel ant colony optimization on GPUs for TSPs , 2017, Science China Information Sciences.

[14]  Zhi-Hua Zhou,et al.  Exploratory Under-Sampling for Class-Imbalance Learning , 2006, Sixth International Conference on Data Mining (ICDM'06).

[15]  Vipin Kumar,et al.  Evaluating boosting algorithms to classify rare classes: comparison and improvements , 2001, Proceedings 2001 IEEE International Conference on Data Mining.

[16]  Simon Fong,et al.  Improving the classification performance of biological imbalanced datasets by swarm optimization algorithms , 2016, The Journal of Supercomputing.

[17]  Simon Fong,et al.  Adaptive swarm cluster-based dynamic multi-objective synthetic minority oversampling technique algorithm for tackling binary imbalanced datasets in biomedical data classification , 2016, BioData Mining.

[18]  Xiao Chen,et al.  A parallel and robust object tracking approach synthesizing adaptive Bayesian learning and improved incremental subspace learning , 2019, Frontiers of Computer Science.

[19]  Xin-She Yang,et al.  A New Metaheuristic Bat-Inspired Algorithm , 2010, NICSO.

[20]  Stephen Kwek,et al.  Applying Support Vector Machines to Imbalanced Datasets , 2004, ECML.

[21]  Nilanjan Dey,et al.  Social group optimization for global optimization of multimodal functions and data clustering problems , 2016, Neural Computing and Applications.

[22]  Anil K. Jain,et al.  Feature Selection: Evaluation, Application, and Small Sample Performance , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Simon Fong,et al.  Feature Selection in Life Science Classification: Metaheuristic Swarm Search , 2014, IT Professional.

[24]  Hui Li,et al.  Application of Random-SMOTE on Imbalanced Data Mining , 2011, 2011 Fourth International Conference on Business Intelligence and Financial Engineering.

[25]  Harish Kumar,et al.  Analysis of Feature Selection Techniques for Network Traffic Dataset , 2013, 2013 International Conference on Machine Intelligence and Research Advancement.

[26]  Lei Wang,et al.  AdaBoost with SVM-based component classifiers , 2008, Eng. Appl. Artif. Intell..

[27]  Richard C. Pais,et al.  The Lung Image Database Consortium (LIDC) and Image Database Resource Initiative (IDRI): a completed reference database of lung nodules on CT scans. , 2011, Medical physics.

[28]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[29]  Peter J. Fleming,et al.  Multiobjective optimization and multiple constraint handling with evolutionary algorithms. I. A unified formulation , 1998, IEEE Trans. Syst. Man Cybern. Part A.

[30]  Riccardo Poli,et al.  Particle Swarm Optimisation , 2011 .

[31]  Meng Wang,et al.  Multimodal Deep Autoencoder for Human Pose Recovery , 2015, IEEE Transactions on Image Processing.

[32]  Nilanjan Dey,et al.  Particle swarm optimization trained neural network for structural failure prediction of multistoried RC buildings , 2016, Neural Computing and Applications.

[33]  Athanasios V. Vasilakos,et al.  Advances of applying metaheuristics to data mining techniques , 2015 .

[34]  Jun Yu,et al.  Click Prediction for Web Image Reranking Using Multimodal Sparse Coding , 2014, IEEE Transactions on Image Processing.

[35]  Riccardo Poli,et al.  Particle swarm optimization , 1995, Swarm Intelligence.

[36]  Zhi-Hua Zhou,et al.  Exploratory Undersampling for Class-Imbalance Learning , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[37]  Torsten Rohlfing,et al.  Performance-based multi-classifier decision fusion for atlas-based segmentation of biomedical images , 2004, 2004 2nd IEEE International Symposium on Biomedical Imaging: Nano to Macro (IEEE Cat No. 04EX821).

[38]  Simon Fong,et al.  Solving the Under-Fitting Problem for Decision Tree Algorithms by Incremental Swarm Optimization in Rare-Event Healthcare Classification , 2016 .

[39]  Larry A. Rendell,et al.  The Feature Selection Problem: Traditional Methods and a New Algorithm , 1992, AAAI.

[40]  Kang Li,et al.  Robust Visual Tracking Based on Convolutional Features with Illumination and Occlusion Handing , 2018, Journal of Computer Science and Technology.

[41]  Nilanjan Dey,et al.  Automated stratification of liver disease in ultrasound: An online accurate feature classification paradigm , 2016, Comput. Methods Programs Biomed..

[42]  Nilanjan Dey,et al.  Computed Tomography Image Enhancement Using Cuckoo Search: A Log Transform Based Approach , 2015 .

[43]  Xiao Chen,et al.  A matting method based on full feature coverage , 2018, Multimedia Tools and Applications.

[44]  Simon Fong,et al.  Histogram of oriented gradient based plantar pressure image feature extraction and classification employing fuzzy support vector machine , 2018 .

[45]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[46]  K. Doi,et al.  Development of a digital image database for chest radiographs with and without a lung nodule: receiver operating characteristic analysis of radiologists' detection of pulmonary nodules. , 2000, AJR. American journal of roentgenology.

[47]  Stan Matwin,et al.  Machine Learning for the Detection of Oil Spills in Satellite Radar Images , 1998, Machine Learning.

[48]  Stephen M. Moore,et al.  The Cancer Imaging Archive (TCIA): Maintaining and Operating a Public Information Repository , 2013, Journal of Digital Imaging.

[49]  Zenglin Xu,et al.  Discriminative Semi-Supervised Feature Selection Via Manifold Regularization , 2009, IEEE Transactions on Neural Networks.

[50]  William H. Hsu,et al.  Genetic wrappers for feature selection in decision tree induction and variable ordering in Bayesian network structure learning , 2004, Inf. Sci..

[51]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[52]  Nilanjan Dey,et al.  A survey of image classification methods and techniques , 2014, 2014 International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT).

[53]  Fazhi He,et al.  Service-Oriented Feature-Based Data Exchange for Cloud-Based Design and Manufacturing , 2018, IEEE Transactions on Services Computing.

[54]  E. Dougherty,et al.  NONLINEAR PROBIT GENE CLASSIFICATION USING MUTUAL INFORMATION AND WAVELET-BASED FEATURE SELECTION , 2004 .

[55]  Jun Yu,et al.  Image-Based 3D Human Pose Recovery with Locality Sensitive Sparse Retrieval , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[56]  Nilanjan Dey,et al.  Effect of fuzzy partitioning in Crohn’s disease classification: a neuro-fuzzy-based approach , 2016, Medical & Biological Engineering & Computing.

[57]  Fazhi He,et al.  A correlative classifiers approach based on particle filter and sample set for tracking occluded target , 2017 .

[58]  Fei Gao,et al.  Deep Multimodal Distance Metric Learning Using Click Constraints for Image Ranking , 2017, IEEE Transactions on Cybernetics.

[59]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[60]  Yiteng Pan,et al.  A novel segmentation model for medical images with intensity inhomogeneity based on adaptive perturbation , 2018, Multimedia Tools and Applications.

[61]  Lars Schmidt-Thieme,et al.  Cost-sensitive learning methods for imbalanced data , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[62]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[63]  Hiroshi Iino,et al.  High-b value diffusion-weighted MRI for detecting pancreatic adenocarcinoma: preliminary results. , 2007, AJR. American journal of roentgenology.

[64]  Simon Fong,et al.  Adaptive Multi-objective Swarm Crossover Optimization for Imbalanced Data Classification , 2016, ADMA.

[65]  Simon Fong,et al.  A Novel Hybrid Self-Adaptive Bat Algorithm , 2014, TheScientificWorldJournal.