A hybrid feature selection algorithm combining ReliefF and Particle swarm optimization for high-dimensional medical data

Due to high-dimensional feature and strong correlation of features, the classification accuracy of medical data is not as good enough as expected. feature selection is a common algorithm to solve this problem, and selects effective features by reducing the dimensionality of high-dimensional data. However, traditional feature selection algorithms have the blindness of threshold setting and the search algorithms are liable to fall into a local optimal solution. Based on it, this paper proposes a hybrid feature selection algorithm combining ReliefF and Particle swarm optimization. The algorithm is mainly divided into three parts: Firstly, the ReliefF is used to calculate the feature weight, and the features are ranked by the weight. Then ranking feature is grouped according to the density equalization, where the density of features in each group is the same. Finally, the Particle Swarm Optimization algorithm is used to search the ranking feature groups, and the feature selection is performed according to a new fitness function. Experimental results show that the random forest has the highest classification accuracy on the features selected. More importantly, it has the least number of features. In addition, experimental results on 2 medical datasets show that the average accuracy of random forest reaches 90.20%, which proves that the hybrid algorithm has a certain application value.

[1]  Markus Pauly,et al.  Consistent estimation of residual variance with random forest Out-Of-Bag errors , 2018, Statistics & Probability Letters.

[2]  Xiaojun Zhou,et al.  A Hybrid Feature Selection Method Based on Binary State Transition Algorithm and ReliefF , 2019, IEEE Journal of Biomedical and Health Informatics.

[3]  Mohammad Ali Zare Chahooki,et al.  A Survey on semi-supervised feature selection methods , 2017, Pattern Recognit..

[4]  Natalia Kryvinska,et al.  An Extended-Input GRNN and its Application , 2019, EUSPN/ICTH.

[5]  Natalia Kryvinska,et al.  Recovery of Incomplete IoT Sensed Data using High-Performance Extended-Input Neural-Like Structure , 2019, EUSPN/ICTH.

[6]  Jianfeng Ma,et al.  Privacy-preserving and high-accurate outsourced disease predictor on random forest , 2019, Inf. Sci..

[7]  Millie Pant,et al.  Link based BPSO for feature selection in big data text clustering , 2017, Future Gener. Comput. Syst..

[8]  Bo Tang,et al.  Semisupervised Feature Selection Based on Relevance and Redundancy Criteria , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[9]  Yun Yang,et al.  A novel bagging C4.5 algorithm based on wrapper feature selection for supporting wise clinical decision making , 2017, J. Biomed. Informatics.

[10]  Behrouz Minaei-Bidgoli,et al.  Optimizing multi-objective PSO based feature selection method using a feature elitism mechanism , 2018, Expert Syst. Appl..

[11]  Zhaohui Wu,et al.  A novel ensemble-based wrapper method for feature selection using extreme learning machine and genetic algorithm , 2018, Knowledge and Information Systems.

[12]  Derong Shen,et al.  A hybrid sampling algorithm combining M-SMOTE and ENN based on Random forest for medical imbalanced data , 2020, J. Biomed. Informatics.

[13]  Asif Ekbal,et al.  Information theoretic-PSO-based feature selection: an application in biomedical entity extraction , 2018, Knowledge and Information Systems.

[14]  Alper Kursat Uysal,et al.  A novel filter feature selection method using rough set for short text data , 2020, Expert Syst. Appl..

[15]  José Carlos Rodriguez Alcantud,et al.  A multimodal adaptive approach on soft set based diagnostic risk prediction system , 2018, J. Intell. Fuzzy Syst..

[16]  Saman K. Halgamuge,et al.  Classification of Parkinson's Disease Gait Using Spatial-Temporal Gait Features , 2015, IEEE Journal of Biomedical and Health Informatics.

[17]  Sebastián Ventura,et al.  Scalable extensions of the ReliefF algorithm for weighting and selecting features on the multi-label learning context , 2015, Neurocomputing.

[18]  Qingshan Jiang,et al.  Feature selection via maximizing global information gain for text classification , 2013, Knowl. Based Syst..

[19]  Athanasios V. Vasilakos,et al.  Accelerated PSO Swarm Search Feature Selection for Data Stream Mining Big Data , 2016, IEEE Transactions on Services Computing.

[20]  Sung-Bae Cho,et al.  Anomalous query access detection in RBAC-administered databases with random forest and PCA , 2016, Inf. Sci..

[21]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[22]  Amr Badr,et al.  A binary clonal flower pollination algorithm for feature selection , 2016, Pattern Recognit. Lett..

[23]  Pramod Kumar Singh,et al.  Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering , 2015, Expert Syst. Appl..

[24]  Omid Avatefipour,et al.  A novel electric load consumption prediction and feature selection model based on modified clonal selection algorithm , 2018, J. Intell. Fuzzy Syst..

[25]  Xiangjian He,et al.  Building an Intrusion Detection System Using a Filter-Based Feature Selection Algorithm , 2016, IEEE Transactions on Computers.

[26]  Myeongsu Kang,et al.  A Hybrid Feature Selection Scheme for Reducing Diagnostic Performance Deterioration Caused by Outliers in Data-Driven Diagnostics , 2016, IEEE Transactions on Industrial Electronics.

[27]  Randal S. Olson,et al.  Relief-Based Feature Selection: Introduction and Review , 2017, J. Biomed. Informatics.

[28]  Luis de Marcos,et al.  Distributed ReliefF-based feature selection in Spark , 2018, Knowledge and Information Systems.

[29]  Thomas Martinetz,et al.  BLProt: prediction of bioluminescent proteins based on support vector machine and relieff feature selection , 2011, BMC Bioinformatics.

[30]  Abdur Rehman,et al.  Feature selection based on a normalized difference measure for text classification , 2017, Inf. Process. Manag..

[31]  Dunwei Gong,et al.  Binary differential evolution with self-learning for multi-objective feature selection , 2020, Inf. Sci..

[32]  Mengjie Zhang,et al.  Differential evolution for filter feature selection based on information theory and feature ranking , 2018, Knowl. Based Syst..

[33]  Yu Zhou,et al.  Many-objective optimization of feature selection based on two-level particle cooperation , 2020, Inf. Sci..

[34]  Raymond Chiong,et al.  Hybrid filter-wrapper feature selection for short-term load forecasting , 2015, Eng. Appl. Artif. Intell..

[35]  Philip S. Yu,et al.  $\textsf{LoPub}$ : High-Dimensional Crowdsourced Data Publication With Local Differential Privacy , 2016, IEEE Transactions on Information Forensics and Security.

[36]  Yamuna Prasad,et al.  A recursive PSO scheme for gene selection in microarray data , 2018, Appl. Soft Comput..

[37]  Luigi Iuppariello,et al.  Using gait analysis' parameters to classify Parkinsonism: A data mining approach , 2019, Comput. Methods Programs Biomed..

[38]  S. C. Neoh,et al.  A Micro-GA Embedded PSO Feature Selection Approach to Intelligent Facial Emotion Recognition , 2017, IEEE Transactions on Cybernetics.

[39]  Nan Yang,et al.  A disease diagnosis and treatment recommendation system based on big data mining and cloud computing , 2018, Inf. Sci..

[40]  Yudong Zhang,et al.  Binary PSO with mutation operator for feature selection using decision tree applied to spam detection , 2014, Knowl. Based Syst..