Undiagnosed samples aided rough set feature selection for medical data

Medical data often consists of a large number of disease markers. For medical data analysis, some disease markers are not helpful and sometimes even have negative effects. Therefore, applying feature selection is necessary as it can remove those unimportant disease markers. Among many feature selection methods, rough set based feature selection (RSFS) has been widely used. Unlike other methods, RSFS is completely data-driven. It does not require any other information like probability distributions. Traditional RSFS methods extract the information only from the diagnosed samples. Therefore, they usually require a large number of diagnosed samples to achieve the good feature selection performance. However, in many real medical applications, diagnosed samples are limited, yet the number of undiagnosed samples is large. Motivated by semi-supervised learning methodology, in this paper, we propose a novel RSFS method which can learn from both diagnosed and undiagnosed samples. This method is called undiagnosed samples aided rough set feature selection (USA-RSFS). Its main benefit is to reduce the requirement on diagnosed samples by the help of undiagnosed ones. Finally, the promising performance of USA-RSFS is validated through a set of experiments on medical datasets.

[1]  K. Thangavel,et al.  Dimensionality reduction based on rough set theory: A review , 2009, Appl. Soft Comput..

[2]  Jerzy Stefanowski,et al.  Rough Set Theory and Decision Rules in Data Analysis of Breast Cancer Patients , 2004, Trans. Rough Sets.

[3]  Wojtek Michalowski,et al.  Supporting triage of children with abdominal pain in the emergency room , 2005, Eur. J. Oper. Res..

[4]  Andrzej Skowron,et al.  Rough set methods in feature selection and recognition , 2003, Pattern Recognit. Lett..

[5]  Duoqian Miao,et al.  A rough set approach to feature selection based on ant colony optimization , 2010, Pattern Recognit. Lett..

[6]  Andrzej Skowron,et al.  Rough Sets: A Tutorial , 1998 .

[7]  Li-Yeh Chuang,et al.  Improved binary PSO for feature selection using gene expression data , 2008, Comput. Biol. Chem..

[8]  Ron Kohavi,et al.  Wrappers for feature selection , 1997 .

[9]  Qiang Shen,et al.  A rough-fuzzy approach for generating classification rules , 2002, Pattern Recognit..

[10]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[11]  Kemal Polat,et al.  A new feature selection method on classification of medical datasets: Kernel F-score feature selection , 2009, Expert Syst. Appl..

[12]  Andrzej Skowron,et al.  The Discernibility Matrices and Functions in Information Systems , 1992, Intelligent Decision Support.

[13]  Xiangyang Wang,et al.  Rough set feature selection and rule induction for prediction of malignancy degree in brain glioma , 2006, Comput. Methods Programs Biomed..

[14]  Gang Zheng,et al.  A Comparative Study of Medical Data Classification Methods Based on Decision Tree and System Reconstruction Analysis , 2005 .

[15]  Igor Kononenko,et al.  Machine learning for medical diagnosis: history, state of the art and perspective , 2001, Artif. Intell. Medicine.

[16]  Yong Shi,et al.  A rough set-based multiple criteria linear programming approach for the medical diagnosis and prognosis , 2009, Expert Syst. Appl..

[17]  Thomas G. Dietterich An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization , 2000, Machine Learning.

[18]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[19]  L. Kuncheva Fuzzy rough sets: application to feature selection , 1992 .

[20]  Hans-Dieter Kochs,et al.  Adapted variable precision rough set approach for EEG analysis , 2009, Artif. Intell. Medicine.

[21]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[22]  Chih-Ping Wei,et al.  Feature Selection for Medical Data Mining: Comparisons of Expert Judgment and Automatic Approaches , 2006, 19th IEEE Symposium on Computer-Based Medical Systems (CBMS'06).

[23]  Piotr Wais,et al.  Artificial Immune System for Medical Data Classification , 2005, International Conference on Computational Science.

[24]  Guoyin Wang,et al.  Rough reduction in algebra view and information view , 2003, Int. J. Intell. Syst..

[25]  Nick Cercone,et al.  Integrating rough set theory and medical applications , 2008, Appl. Math. Lett..

[26]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[27]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[28]  Nick Cercone,et al.  A foundation of rough sets theoretical and computational hybrid intelligent system for survival analysis , 2008, Comput. Math. Appl..

[29]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[30]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[31]  Anil K. Jain,et al.  Dimensionality reduction using genetic algorithms , 2000, IEEE Trans. Evol. Comput..

[32]  Hsu-Hao Yang,et al.  Rough sets to help medical diagnosis - Evidence from a Taiwan's clinic , 2009, Expert Syst. Appl..

[33]  Erkki Oja,et al.  Independent component analysis: algorithms and applications , 2000, Neural Networks.

[34]  Ellen Riloff,et al.  Learning subjective nouns using extraction pattern bootstrapping , 2003, CoNLL.

[35]  Alicja Wakulicz-Deja,et al.  Rough Sets Approach to Medical Diagnosis System , 2005, AWIC.

[36]  Andrzej Skowron,et al.  Rough-Fuzzy Hybridization: A New Trend in Decision Making , 1999 .

[37]  Krzysztof Krawiec,et al.  ROUGH SET REDUCTION OF ATTRIBUTES AND THEIR DOMAINS FOR NEURAL NETWORKS , 1995, Comput. Intell..