Hybrid System based on Rough Sets and Genetic Algorithms for Medical Data Classifications

Computational intelligence provides the biomedical domain by a significant support. The application of machine learning techniques in medical applications have been evolved from the physician needs. Screening, medical images, pattern classification, prognosis are some examples of health care support systems. Typically medical data has its own characteristics such as huge size and features, continuous and real attributes that refer to patients' investigations. Therefore, discretization and feature selection process are considered a key issue in improving the extracted knowledge from patients' investigations records. In this paper, a hybrid system that integrates Rough Set RS and Genetic Algorithm GA is presented for the efficient classification of medical data sets of different sizes and dimensionalities. Genetic Algorithm is applied with the aim of reducing the dimension of medical datasets and RS decision rules were used for efficient classification. Furthermore, the proposed system applies the Entropy Gain Information EI for discretization process. Four biomedical data sets are tested by the proposed system EI-GA-RS, and the highest score was obtained through three different datasets. Other different hybrid techniques shared the proposed technique the highest accuracy but the proposed system preserves its place as one of the highest results systems four three different sets. EI as discretization technique also is a common part for the best results in the mentioned datasets while RS as an evaluator realized the best results in three different data sets.

[1]  Alicja Wakulicz-Deja,et al.  Visualization of Rough Set Decision Rules for Medical Diagnosis Systems , 2009, RSFDGrC.

[2]  Shu-Chen Kao,et al.  Targeting customers via discovery knowledge for the insurance industry , 2005, Expert Syst. Appl..

[3]  Gustavo E. A. P. A. Batista,et al.  An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..

[4]  K. Thanushkodi,et al.  A Novel Rough Set Reduct Algorithm for Medical Domain Based on Bee Colony Optimization , 2010, ArXiv.

[5]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[6]  Adam Mrózek,et al.  Rough sets in hybrid methods for pattern recognition , 2001, Int. J. Intell. Syst..

[7]  Krzysztof J. Cios,et al.  Uniqueness of medical data mining , 2002, Artif. Intell. Medicine.

[8]  Z. Pawlak Rough Sets: Theoretical Aspects of Reasoning about Data , 1991 .

[9]  Can Isik,et al.  Feature subset selection for blood pressure classification using orthogonal forward selection , 2003, 2003 IEEE 29th Annual Proceedings of Bioengineering Conference.

[10]  Hanaa Ismail Elshazly,et al.  Rough sets and genetic algorithms: A hybrid approach to breast cancer classification , 2012, 2012 World Congress on Information and Communication Technologies.

[11]  Zhiyuan Luo,et al.  Gene Selection for Cancer Classification using Wilcoxon Rank Sum Test and Support Vector Machine , 2006, 2006 International Conference on Computational Intelligence and Security.

[12]  Igor Kononenko,et al.  Machine learning for medical diagnosis: history, state of the art and perspective , 2001, Artif. Intell. Medicine.

[13]  L. Ladha,et al.  FEATURE SELECTION METHODS AND ALGORITHMS , 2011 .

[14]  Xin Jin,et al.  Machine Learning Techniques and Chi-Square Feature Selection for Cancer Classification Using SAGE Gene Expression Profiles , 2006, BioDM.

[15]  Witold Pedrycz,et al.  Data Mining Methods for Knowledge Discovery , 1998, IEEE Trans. Neural Networks.

[16]  Aboul Ella Hassanien,et al.  Rough Computing: Theories, Technologies and Applications , 2007 .

[17]  Madhuchhanda Mitra,et al.  A Rough-Set-Based Inference Engine for ECG Classification , 2006, IEEE Transactions on Instrumentation and Measurement.

[18]  Ronen Feldman,et al.  The Data Mining and Knowledge Discovery Handbook , 2005 .

[19]  Aleksander Øhrn,et al.  Discernibility and Rough Sets in Medicine: Tools and Applications , 2000 .

[20]  Yonghong Peng,et al.  A novel feature selection approach for biomedical data classification , 2010, J. Biomed. Informatics.

[21]  Andrzej Skowron,et al.  Rough set methods in feature selection and recognition , 2003, Pattern Recognit. Lett..

[22]  Adel Al-Jumaily,et al.  Feature subset selection using differential evolution and a statistical repair mechanism , 2011, Expert Syst. Appl..

[23]  Jacob Zahavi,et al.  Using simulated annealing to optimize the feature selection problem in marketing applications , 2006, Eur. J. Oper. Res..

[24]  Huan Liu,et al.  Consistency-based search in feature selection , 2003, Artif. Intell..

[25]  Yumin Chen,et al.  A rough set approach to feature selection based on power set tree , 2011, Knowl. Based Syst..

[26]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[27]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[28]  Jesús S. Aguilar-Ruiz,et al.  Fast feature selection aimed at high-dimensional data via hybrid-sequential-ranked searches , 2012, Expert Syst. Appl..

[29]  Yen-Zen Wang,et al.  A GA-based methodology to determine an optimal curriculum for schools , 2005, Expert Syst. Appl..

[30]  Cor J. Veenman,et al.  A sparse nearest mean classifier for high dimensional multi-class problems , 2011, Pattern Recognit. Lett..

[31]  Jerzy W. Grzymala-Busse,et al.  Rough sets : New horizons in commercial and industrial AI , 1995 .

[32]  Ajith Abraham,et al.  Rough Sets in Medical Informatics Applications , 2009, SOCO 2009.

[33]  Nada Lavrac,et al.  Selected techniques for data mining in medicine , 1999, Artif. Intell. Medicine.

[34]  Shutao Li,et al.  Gene Selection Using Wilcoxon Rank Sum Test and Support Vector Machine for Cancer Classification , 2007, CIS.

[35]  A. Cappello,et al.  Feature selection of stabilometric parameters based on principal component analysis , 2006, Medical and Biological Engineering and Computing.

[36]  Bhaskar D. Rao,et al.  Backward sequential elimination for sparse vector subset selection , 2001, Signal Process..

[37]  Huan Liu,et al.  Instance Selection and Construction for Data Mining , 2001 .

[38]  Alex Alves Freitas,et al.  Evolutionary Algorithms for Data Mining , 2005, The Data Mining and Knowledge Discovery Handbook.

[39]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[40]  Sri Ramakrishna,et al.  FEATURE SELECTION METHODS AND ALGORITHMS , 2011 .

[41]  Shu-Hsien Liao,et al.  Data mining techniques and applications - A decade review from 2000 to 2011 , 2012, Expert Syst. Appl..

[42]  Andrew K. C. Wong,et al.  Information synthesis based on hierarchical maximum entropy discretization , 1990, J. Exp. Theor. Artif. Intell..

[43]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[44]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[45]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[46]  Masoud Nikravesh,et al.  Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing) , 2006 .

[47]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[48]  Thomas A. Runkler,et al.  Ant Colony Optimization Applied to Feature Selection in Fuzzy Classifiers , 2007, IFSA.

[49]  Mohamed Amir Esseghir,et al.  Effective Wrapper-Filter hybridization through GRASP Schemata , 2010, FSDM.

[50]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[51]  K. Kondo,et al.  Rough sets based medical image segmentation with connectedness , 2004, Proceedings World Automation Congress, 2004..

[52]  Pablo Suau,et al.  Information Theory in Computer Vision and Pattern Recognition , 2009 .

[53]  Yong Shi,et al.  A rough set-based multiple criteria linear programming approach for the medical diagnosis and prognosis , 2009, Expert Syst. Appl..

[54]  Chih-Fong Tsai,et al.  Feature selection in bankruptcy prediction , 2009, Knowl. Based Syst..

[55]  Wlodzislaw Duch,et al.  Feature Selection for High-Dimensional Data - A Pearson Redundancy Based Filter , 2008, Computer Recognition Systems 2.

[56]  Shusaku Tsumoto,et al.  Mining diagnostic rules from clinical databases using rough sets and medical diagnostic model , 2004, Inf. Sci..

[57]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[58]  Hans-Dieter Kochs,et al.  Adapted variable precision rough set approach for EEG analysis , 2009, Artif. Intell. Medicine.

[59]  L. Polkowski Rough Sets: Mathematical Foundations , 2013 .

[60]  Jason Catlett,et al.  On Changing Continuous Attributes into Ordered Discrete Attributes , 1991, EWSL.

[61]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[62]  Ahmad Taher Azar,et al.  Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis , 2014, Comput. Methods Programs Biomed..

[63]  Andrzej Skowron,et al.  Independent Component Analysis, Princpal Component Analysis and Rough Sets in Hybrid Mammogram Classification , 2006, IPCV.

[64]  Toshinori Munakata,et al.  Fundamentals of the New Artificial Intelligence - Neural, Evolutionary, Fuzzy and More, Second Edition , 2007, Texts in Computer Science.

[65]  Aboul Ella Hassanien,et al.  Detection of Spiculated Masses in Mammograms Based on Fuzzy Image Processing , 2004, ICAISC.

[66]  Toshinori Munakata,et al.  Fundamentals of the New Artificial Intelligence: Neural, Evolutionary, Fuzzy and More (Texts in Computer Science) , 2008 .

[67]  Jidong Zhao,et al.  Locality sensitive semi-supervised feature selection , 2008, Neurocomputing.