Enhancing the Performance of LibSVM Classifier by Kernel F-Score Feature Selection

Medical Data mining is the search for relationships and patterns within the medical datasets that could provide useful knowledge for effective clinical decisions. The inclusion of irrelevant, redundant and noisy features in the process model results in poor predictive accuracy. Much research work in data mining has gone into improving the predictive accuracy of the classifiers by applying the techniques of feature selection. Feature selection in medical data mining is appreciable as the diagnosis of the disease could be done in this patient-care activity with minimum number of significant features. The objective of this work is to show that selecting the more significant features would improve the performance of the classifier. We empirically evaluate the classification effectiveness of LibSVM classifier on the reduced feature subset of diabetes dataset. The evaluations suggest that the feature subset selected improves the predictive accuracy of the classifier and reduce false negatives and false positives.

[1]  Nello Cristianini,et al.  Support vector machine classification and validation of cancer tissue samples using microarray expression data , 2000, Bioinform..

[2]  Nigel Collier,et al.  Bio-Medical Entity Extraction using Support Vector Machines , 2005, Artif. Intell. Medicine.

[3]  Jack Sklansky,et al.  On Automatic Feature Selection , 1988, Int. J. Pattern Recognit. Artif. Intell..

[4]  F. Harrell,et al.  Artificial neural networks improve the accuracy of cancer survival prediction , 1997, Cancer.

[5]  P. Langley Selection of Relevant Features in Machine Learning , 1994 .

[6]  K. Schittkowski Optimal parameter selection in support vector machines , 2005 .

[7]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[8]  S. S. Iyengar,et al.  Medical Datamining with a New Algorithm for Feature Selection and Naive Bayesian Classifier , 2007, 10th International Conference on Information Technology (ICIT 2007).

[9]  Keng Siau,et al.  A review of data mining techniques , 2001, Ind. Manag. Data Syst..

[10]  Thomas G. Dietterich,et al.  Efficient Algorithms for Identifying Relevant Features , 1992 .

[11]  K. Lebart,et al.  A stochastic optimization approach for parameter tuning of support vector machines , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[12]  Susmita Sur-Kolay,et al.  Fast Robust Intellectual Property Protection for VLSI Physical Design , 2007 .

[13]  R Kahavi,et al.  Wrapper for feature subset selection , 1997 .

[14]  Dorian Pyle,et al.  Data Preparation for Data Mining , 1999 .

[15]  Charles X. Ling,et al.  Using AUC and accuracy in evaluating learning algorithms , 2005, IEEE Transactions on Knowledge and Data Engineering.

[16]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[17]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[18]  S. Sathiya Keerthi,et al.  Evaluation of simple performance measures for tuning SVM hyperparameters , 2003, Neurocomputing.

[19]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection , 1998 .

[20]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[21]  Nada Lavrac,et al.  Selected techniques for data mining in medicine , 1999, Artif. Intell. Medicine.

[22]  Antoine Geissbühler,et al.  Learning from imbalanced data in surveillance of nosocomial infection , 2006, Artif. Intell. Medicine.

[23]  Tomaso A. Poggio,et al.  Image representations for object detection using kernel classifiers , 2000 .

[24]  P. H. Sönksen,et al.  Data mining for indicators of early mortality in a database of clinical records , 2001, Artif. Intell. Medicine.

[25]  Sayan Mukherjee,et al.  Feature Selection for SVMs , 2000, NIPS.

[26]  Mu-Chen Chen,et al.  Prediction model building and feature selection with support vector machines in breast cancer diagnosis , 2008, Expert Syst. Appl..

[27]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[28]  KononenkoIgor Machine learning for medical diagnosis , 2001 .

[29]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[30]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[31]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[32]  Dursun Delen,et al.  Predicting breast cancer survivability: a comparison of three data mining methods , 2005, Artif. Intell. Medicine.

[33]  Giovanni Felici,et al.  Feature Selection for Data Mining , 2006 .

[34]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[35]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[36]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[37]  Krzysztof J. Cios,et al.  Uniqueness of medical data mining , 2002, Artif. Intell. Medicine.

[38]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[39]  Kate Smith-Miles,et al.  Automatic parameter selection for polynomial kernel , 2003, Proceedings Fifth IEEE Workshop on Mobile Computing Systems and Applications.

[40]  Jill P. Mesirov,et al.  Support Vector Machine Classification of Microarray Data , 2001 .

[41]  Hiroshi Motoda,et al.  Feature Extraction, Construction and Selection: A Data Mining Perspective , 1998 .