Selection of Important Attributes for Medical Diagnosis Systems

Success of machine learning algorithms is usually dependent on a quality of a dataset they operate on. For datasets containing noisy, inadequate or irrelevant information these algorithms may produce less accurate results. Therefore a common pre-processing step in data mining domain is a selection of highly predictive attributes. In this case study we select subsets of attributes from medical data using filter feature selection algorithms. To validate the algorithms we induce decision rules from the selected subsets of attributes and compare classification accuracy on both training and test datasets. Additionally medical relevance of the selected attributes is checked with help of domain experts.

[1]  E. Ghiselli Theory of psychological measurement , 1964 .

[2]  Jan G. Bazan,et al.  Rough set algorithms in classification problem , 2000 .

[3]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[4]  Jiye Li,et al.  Introducing a Rule Importance Measure , 2006, Trans. Rough Sets.

[5]  S. Tsumoto,et al.  Rough set methods and applications: new developments in knowledge discovery in information systems , 2000 .

[6]  Andrzej Skowron,et al.  Rough set methods in feature selection and recognition , 2003, Pattern Recognit. Lett..

[7]  Jakub Wróblewski,et al.  Adaptive Aspects of Combining Approximation Spaces , 2004, Rough-Neural Computing: Techniques for Computing with Words.

[8]  Alicja Wakulicz-Deja,et al.  Rough Sets Approach to Medical Diagnosis System , 2005, AWIC.

[9]  Andrzej Skowron,et al.  Transactions on Rough Sets V , 2006, Trans. Rough Sets.

[10]  Wojciech Ziarko,et al.  The Discovery, Analysis, and Representation of Data Dependencies in Databases , 1991, Knowledge Discovery in Databases.

[11]  Andrzej Skowron,et al.  Independent Component Analysis, Principal Component Analysis and Rough Sets in Face Recognition , 2004, Trans. Rough Sets.

[12]  Jerzy W. Grzymala-Busse,et al.  A New Version of the Rule Induction System LERS , 1997, Fundam. Informaticae.

[13]  Jerzy W. Grzymala-Busse,et al.  MLEM2 - Discretization During Rule Induction , 2003, IIS.

[14]  Andrzej Skowron,et al.  Rudiments of rough sets , 2007, Inf. Sci..

[15]  R. Mlynarski,et al.  Rough set techniques for medical diagnosis systems , 2005, Computers in Cardiology, 2005.

[16]  Igor Kononenko,et al.  On Biases in Estimating Multi-Valued Attributes , 1995, IJCAI.

[17]  Maciej Modrzejewski,et al.  Feature Selection Using Rough Sets Theory , 1993, ECML.

[18]  Alicja Wakulicz-Deja,et al.  Applying Rough Set Theory to Multi Stage Medical Diagnosing , 2003, Fundam. Informaticae.

[19]  John Bibby,et al.  The Analysis of Contingency Tables , 1978 .

[20]  Andrzej Skowron,et al.  Rough sets and Boolean reasoning , 2007, Inf. Sci..

[21]  Alicja Wakulicz-Deja,et al.  Attribute Selection and Rule Generation Techniques for Medical Diagnosis Systems , 2005, RSFDGrC.

[22]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[23]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[24]  E. Pilat,et al.  Automated decision support and guideline verification in clinical practice , 2005, Computers in Cardiology, 2005.

[25]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[26]  Janusz Kacprzyk,et al.  Advances in Web Intelligence , 2003, Lecture Notes in Computer Science.

[27]  Ron Kohavi,et al.  The Power of Decision Tables , 1995, ECML.

[28]  Zdzislaw Pawlak Knowledge and Uncertainty: A Rough Set Approach , 1993, SOFTEKS Workshop on Incompleteness and Uncertainty in Information Systems.

[29]  Alicja Wakulicz-Deja,et al.  The Application of Support Diagnose in Mitochondrial Encephalomyopathies , 2002, Rough Sets and Current Trends in Computing.

[30]  Sadaaki Miyamoto,et al.  Rough Sets and Current Trends in Computing , 2012, Lecture Notes in Computer Science.

[31]  Andrzej Skowron,et al.  Rough-Neural Computing: Techniques for Computing with Words , 2004, Cognitive Technologies.

[32]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[33]  William Frawley,et al.  Knowledge Discovery in Databases , 1991 .

[34]  Jerzy W. Grzymala-Busse,et al.  Transactions on Rough Sets I , 2004, Lecture Notes in Computer Science.

[35]  Andrzej Skowron,et al.  Rough sets: Some extensions , 2007, Inf. Sci..

[36]  Pavel B. Brazdil,et al.  Machine Learning: ECML-93 , 1993, Lecture Notes in Computer Science.

[37]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .