A hybrid approach using rough set theory and hypergraph for feature selection on high-dimensional medical datasets

Abstract‘Curse of Dimensionality’—massive generation of high-dimensional medical datasets from various biomedical applications hardens the data analytic process for precise medical diagnosis. The design of an efficient feature selection technique for finding the optimal feature subset can be devised as a prominent solution to the above-said challenge. Further, it also improves the accuracy and minimizes the computational complexity of the learning model. The state-of-the-art feature selection techniques based on heuristic and statistical functions suffer from significant challenges in terms of classification accuracy, time complexity, etc. Hence, this paper presents Rough Set Theory and Hypergraph (RSHGT)-based feature selection technique to identify the optimal feature subset for accurate medical diagnosis. Experimental validations using six medical datasets from the Kent Ridge Biomedical dataset repository prove the efficiency of RSHGT in terms of reduct size, accuracy, precision, recall, and time complexity.

[1]  Abdelkader Benyettou,et al.  Kernel-based learning and feature selection analysis for cancer diagnosis , 2017, Appl. Soft Comput..

[2]  Nassir Navab,et al.  Survival analysis for high-dimensional, heterogeneous medical data: Exploring feature extraction as an alternative to feature selection , 2016, Artif. Intell. Medicine.

[3]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[4]  Ayman M. Eldeib,et al.  Breast cancer classification using deep belief networks , 2016, Expert Syst. Appl..

[5]  V. S. Shankar Sriram,et al.  Development of Rough Set - Hypergraph Technique for Key Feature Identification in Intrusion Detection Systems , 2017, Comput. Electr. Eng..

[6]  Mansour Sheikhan,et al.  Hybrid of binary gravitational search algorithm and mutual information for feature selection in intrusion detection systems , 2015, Soft Computing.

[7]  Claude Berge,et al.  Graphs and Hypergraphs , 2021, Clustering.

[8]  Elias C. Stavropoulos,et al.  Journal of Graph Algorithms and Applications an Efficient Algorithm for the Transversal Hypergraph Generation , 2022 .

[9]  K. Kannan,et al.  Root Mean Square filter for noisy images based on hyper graph model , 2010, Image Vis. Comput..

[10]  Zexuan Zhu,et al.  Markov blanket-embedded genetic algorithm for gene selection , 2007, Pattern Recognit..

[11]  Jin-Kao Hao,et al.  Gene Selection for Microarray Data by a LDA-Based Genetic Algorithm , 2008, PRIB.

[12]  Amparo Alonso-Betanzos,et al.  Filter Methods for Feature Selection - A Comparative Study , 2007, IDEAL.

[13]  Shiva Pirhadi,et al.  Biomarker Discovery Based on Hybrid Optimization Algorithm and Artificial Neural Networks on Microarray Data for Cancer Classification , 2015, Journal of medical signals and sensors.

[14]  Wang Guo,et al.  Decision Table Reduction based on Conditional Information Entropy , 2002 .

[15]  Feng Jiang,et al.  A relative decision entropy-based feature selection approach , 2015, Pattern Recognit..

[16]  Xiaosheng Wang,et al.  Microarray-Based Cancer Prediction Using Soft Computing Approach , 2009, Cancer informatics.

[17]  V. S. Shankar Sriram,et al.  An improved rough set approach for optimal trust measure parameter selection in cloud environments , 2019, Soft Comput..

[18]  Enrique Alba,et al.  Gene selection in cancer classification using PSO/SVM and GA/SVM hybrid algorithms , 2007, 2007 IEEE Congress on Evolutionary Computation.

[19]  Lili Diao,et al.  A Heuristic Optimal Reduct Algorithm , 2000, IDEAL.

[20]  V. S. Shankar Sriram,et al.  A Hypergraph and Arithmetic Residue-based Probabilistic Neural Network for classification in Intrusion Detection Systems , 2017, Neural Networks.

[21]  Jiawei Han,et al.  An Attribute-Oriented Rough Set Approach for Knowledge Discovery in Databases , 1993, RSKD.

[22]  V. S. Shankar Sriram,et al.  Hypergraph Based Feature Selection Technique for Medical Diagnosis , 2016, Journal of Medical Systems.

[23]  Nizamettin Aydin,et al.  Binary black hole algorithm for feature selection and classification on biological data , 2017, Appl. Soft Comput..

[24]  Zdzislaw Pawlak,et al.  Rough Set Theory and its Applications to Data Analysis , 1998, Cybern. Syst..

[25]  Yuchang Lu,et al.  Feature ranking in rough sets , 2003, AI Commun..

[26]  Timoteo Carletti,et al.  The Stochastic Evolution of a Protocell: The Gillespie Algorithm in a Dynamically Varying Volume , 2011, Comput. Math. Methods Medicine.

[27]  Driss Aboutajdine,et al.  A two-stage gene selection scheme utilizing MRMR filter and GA wrapper , 2011, Knowledge and Information Systems.

[28]  Seyed Mohammad Hosseini,et al.  A Novel Weighted Support Vector Machine Based on Particle Swarm Optimization for Gene Selection and Tumor Classification , 2012, Comput. Math. Methods Medicine.

[29]  V. S. Shankar Sriram,et al.  An improved robust heteroscedastic probabilistic neural network based trust prediction approach for cloud service selection , 2018, Neural Networks.

[30]  Yu Xue,et al.  A hybrid feature selection algorithm for gene expression data classification , 2017, Neurocomputing.

[31]  Shuai Wang,et al.  UDSFS: Unsupervised deep sparse feature selection , 2016, Neurocomputing.

[32]  Barnali Sahu,et al.  A Novel Feature Selection Algorithm using Particle Swarm Optimization for Cancer Microarray Data , 2012 .

[33]  Yumin Chen,et al.  Finding rough set reducts with fish swarm algorithm , 2015, Knowl. Based Syst..

[34]  Ahmad Taher Azar,et al.  Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis , 2014, Comput. Methods Programs Biomed..

[35]  V. S. Shankar Sriram,et al.  Rough Set-hypergraph-based Feature Selection Approach for Intrusion Detection Systems , 2016 .

[36]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[37]  Georg Gottlob,et al.  Identifying the Minimal Transversals of a Hypergraph and Related Problems , 1995, SIAM J. Comput..

[38]  Mohammad Karim Sohrabi,et al.  Multi-objective feature selection for warfarin dose prediction , 2017, Comput. Biol. Chem..

[39]  Ajith Abraham,et al.  Rough Set Theory: A True Landmark in Data Analysis , 2009 .

[40]  Ahmad Taher Azar,et al.  A novel hybrid feature selection method based on rough set and improved harmony search , 2015, Neural Computing and Applications.

[41]  Damodar Reddy Edla,et al.  RST-BatMiner: A fuzzy rule miner integrating rough set feature selection and Bat optimization for detection of diabetes disease , 2017, Appl. Soft Comput..

[42]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[43]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[44]  V. S. Shankar Sriram,et al.  An efficient intrusion detection system based on hypergraph - Genetic algorithm for parameter optimization and feature selection in support vector machine , 2017, Knowl. Based Syst..

[45]  V. S. Shankar Sriram,et al.  A rough set-based hypergraph trust measure parameter selection technique for cloud service selection , 2017, The Journal of Supercomputing.

[46]  Shutao Li,et al.  Gene selection using hybrid particle swarm optimization and genetic algorithm , 2008, Soft Comput..

[47]  Jin-Kao Hao,et al.  A hybrid LDA and genetic algorithm for gene selection and classification of microarray data , 2010, Neurocomputing.