Medical data mining by fuzzy modeling with selected features

OBJECTIVE Medical data is often very high dimensional. Depending upon the use, some data dimensions might be more relevant than others. In processing medical data, choosing the optimal subset of features is such important, not only to reduce the processing cost but also to improve the usefulness of the model built from the selected data. This paper presents a data mining study of medical data with fuzzy modeling methods that use feature subsets selected by some indices/methods. METHODS Specifically, three fuzzy modeling methods including the fuzzy k-nearest neighbor algorithm, a fuzzy clustering-based modeling, and the adaptive network-based fuzzy inference system are employed. For feature selection, a total of 11 indices/methods are used. Medical data mined include the Wisconsin breast cancer dataset and the Pima Indians diabetes dataset. The classification accuracy and computational time are reported. To show how good the best performer is, the globally optimal was also found by carrying out an exhaustive testing of all possible combinations of feature subsets with three features. RESULTS For the Wisconsin breast cancer dataset, the best accuracy of 97.17% was obtained, which is only 0.25% lower than that was obtained by exhaustive testing. For the Pima Indians diabetes dataset, the best accuracy of 77.65% was obtained, which is only 0.13% lower than that obtained by exhaustive testing. CONCLUSION This paper has shown that feature selection is important to mining medical data for reducing processing time and for increasing classification accuracy. However, not all combinations of feature selection and modeling methods are equally effective and the best combination is often data-dependent, as supported by the breast cancer and diabetes data analyzed in this paper.

[1]  Aytürk Keles,et al.  Neuro-fuzzy classification of prostate cancer using NEFCLASS-J , 2007, Comput. Biol. Medicine.

[2]  Joung Woo Ryu,et al.  Optimized Fuzzy Classification Using Genetic Algorithm , 2005, FSKD.

[3]  Chih-Ming Chen,et al.  An efficient fuzzy classifier with feature selection based on fuzzy entropy , 2001, IEEE Trans. Syst. Man Cybern. Part B.

[4]  J. Ruiz-Gomez,et al.  Input-Output Fuzzy Identification of Nonlinear Multivariable Systems. Application to a Case of AIDS Spread Forecast , 2009 .

[5]  Hans-Jürgen Zimmermann,et al.  Pattern classification with principal component analysis and fuzzy rule bases , 2000, Eur. J. Oper. Res..

[6]  Kemal Polat,et al.  An expert system approach based on principal component analysis and adaptive neuro-fuzzy inference system to diagnosis of diabetes disease , 2007, Digit. Signal Process..

[7]  Rudolf Kruse,et al.  Obtaining interpretable fuzzy classification rules from medical data , 1999, Artif. Intell. Medicine.

[8]  Leonid Hrebien,et al.  New criteria for selecting differentially expressed genes. , 2007, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[9]  T. Warren Liao,et al.  II, A fuzzy c-means variant for the generation of fuzzy term sets , 2003, Fuzzy Sets Syst..

[10]  Gary G. Yen,et al.  Wavelet packet feature extraction for vibration monitoring , 2000, IEEE Trans. Ind. Electron..

[11]  Pasi Luukka,et al.  Similarity classifier with generalized mean applied to medical data , 2006, Comput. Biol. Medicine.

[12]  Pasi Luukka,et al.  Similarity classifier using similarity measure derived from Yu's norms in classification of medical data sets , 2007, Comput. Biol. Medicine.

[13]  Jane Yung-jen Hsu,et al.  Building a Medical Decision Support System for Colon Polyp Screening by Using Fuzzy Classification Trees , 2004, Applied Intelligence.

[14]  Kukkurainen Paavo,et al.  Many-Valued Similarity Reasoning. An Axiomatic Approach , 2002 .

[15]  T. W. Liao,et al.  Detection of welding flaws from radiographic images with fuzzy clustering methods , 1999, Fuzzy Sets Syst..

[16]  Weidong Xu,et al.  Application of CMAC-Based Networks on Medical Image Classification , 2004, ISNN.

[17]  MusílekPetr,et al.  A survey of Knowledge Discovery and Data Mining process models , 2006 .

[18]  T. Warren Liao,et al.  Fuzzy reasoning based automatic inspection of radiographic welds: weld recognition , 2004, J. Intell. Manuf..

[19]  Nabil Belacel,et al.  Multicriteria fuzzy assignment method: a useful tool to assist medical diagnosis , 2001, Artif. Intell. Medicine.

[20]  Pedro Larrañaga,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004, Artif. Intell. Medicine.

[21]  Abraham P. Punnen,et al.  Learning multicriteria fuzzy classification method PROAFTN from data , 2007, Comput. Oper. Res..

[22]  T. Warren Liao,et al.  MINING HUMAN INTERPRETABLE KNOWLEDGE WITH FUZZY MODELING METHODS: AN OVERVIEW , 2006 .

[23]  Mitsunori Ogihara,et al.  Prediction of biologically significant components from microarray data: Independently Consistent Expression Discriminator (ICED) , 2003, Bioinform..

[24]  Serge Guillaume,et al.  Designing fuzzy inference systems from data: An interpretability-oriented review , 2001, IEEE Trans. Fuzzy Syst..

[25]  LuukkaPasi Similarity classifier using similarity measure derived from Yu's norms in classification of medical data sets , 2007 .

[26]  J. Ruiz Gomez,et al.  Input-Output Fuzzy Identification of Nonlinear Multivariable Systems. Application to a Case of AIDS Spread Forecast , 2003, IWANN.

[27]  Constantine Kotropoulos,et al.  Feature Selection Based on Mutual Correlation , 2006, CIARP.

[28]  Raouf N. Gorgui-Naguib,et al.  A fuzzy logic based-method for prognostic decision making in breast and prostate cancers , 2003, IEEE Transactions on Information Technology in Biomedicine.

[29]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[30]  T. Warren Liao,et al.  Classification of weld flaws with imbalanced class data , 2008, Expert Syst. Appl..

[31]  Dimitrios I. Fotiadis,et al.  A Framework for Fuzzy Expert System Creation—Application to Cardiovascular Diseases , 2007, IEEE Transactions on Biomedical Engineering.

[32]  Jyh-Shing Roger Jang,et al.  ANFIS: adaptive-network-based fuzzy inference system , 1993, IEEE Trans. Syst. Man Cybern..

[33]  Gwi-Tae Park,et al.  A methodology of computer aided diagnostic system on breast cancer , 2005, Proceedings of 2005 IEEE Conference on Control Applications, 2005. CCA 2005..

[34]  B. Palaniappan,et al.  An investigation of neuro-fuzzy systems in psychosomatic disorders , 2005, Expert Syst. Appl..

[35]  Hans-Jürgen Zimmermann,et al.  Fuzzy rule based classification with FeatureSelector and modified threshold accepting , 2000, Eur. J. Oper. Res..

[36]  InzaIñaki,et al.  Filter versus wrapper gene selection approaches in DNA microarray domains , 2004 .

[37]  M.M.B.R. Vellasco,et al.  Inverted hierarchical neuro-fuzzy BSP system: a novel neuro-fuzzy model for pattern classification and rule extraction in databases , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[38]  Giovanni Felici,et al.  Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques (Massive Computing) , 2006 .

[39]  Jane Yung-jen Hsu,et al.  A Medical Decision Support System for Polyp Screening by Using Fuzzy Classification Trees , 2002, AMIA.

[40]  T. Warren Liao,et al.  A fuzzy c‐medians variant for the generation of fuzzy term sets , 2002, Int. J. Intell. Syst..

[41]  Ferenc Szeifert,et al.  Supervised fuzzy clustering for the identification of fuzzy classifiers , 2003, Pattern Recognit. Lett..

[42]  Krzysztof J. Cios,et al.  Uniqueness of medical data mining , 2002, Artif. Intell. Medicine.

[43]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[44]  Chieh-Yuan Tsai,et al.  A case-based reasoning system for PCB principal process parameter identification , 2007, Expert Syst. Appl..

[45]  Dong Seong Kim,et al.  Toward Modeling Lightweight Intrusion Detection System Through Correlation-Based Hybrid Feature Selection , 2005, CISC.

[46]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.