A predictive model using improved Normalized Point Wise Mutual Information (INPMI)

In machine learning, selection of optimal features for the classifier is a critical problem. In order to address this problem a novel feature selection method named “Improved Normalized Point wise Mutual Information (INPMI)” is proposed. The proposed INPMI method coupled with Sequential forward search (SFS) finds the best feature subset to aid feature selection process. The key properties of evaluating feature subset i.e. relevancy and redundancy are analysed well. The classifiers like Naive Bayes, Support Vector Machine and J48 are used to determine the accuracy for the choice of features selected. Experimental results with benchmark medical datasets from UCI (University of California Irvine) machine learning database show that proposed INPMI-NB model with SFS, INPMI-SVM model with SFS, INPMI-J48model with SFS achieves 98.36 %, 98.90 %, 94.53 % classification accuracy and selects 22 features for erythemato-squamous diseases. Also the proposed work is evaluated on a World Aircraft dataset to prove its generalization ability. Experimental results prove that the proposed INPMI method outperforms the existing systems.

[1]  Kazuyuki Murase,et al.  A new local search based hybrid genetic algorithm for feature selection , 2011, Neurocomputing.

[2]  Elif Derya íbeyli Combined neural networks for diagnosis of erythemato-squamous diseases , 2009 .

[3]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[4]  Elif Derya Übeyli Combined neural networks for diagnosis of erythemato-squamous diseases , 2009, Expert Syst. Appl..

[5]  Yonghong Peng,et al.  A novel feature selection approach for biomedical data classification , 2010, J. Biomed. Informatics.

[6]  Edward R. Dougherty,et al.  Performance of feature-selection methods in the classification of high-dimension data , 2009, Pattern Recognit..

[7]  Yue Han,et al.  A Variance Reduction Framework for Stable Feature Selection , 2010, 2010 IEEE International Conference on Data Mining.

[8]  M. Cevdet Ince,et al.  A new feature selection method based on association rules for diagnosis of erythemato-squamous diseases , 2009, Expert Syst. Appl..

[9]  William F. Punch,et al.  Knowledge discovery in medical and biological datasets using a hybrid Bayes classifier/evolutionary algorithm , 2003, IEEE Trans. Syst. Man Cybern. Part B.

[10]  Yi Liu,et al.  FS_SFS: A novel feature selection method for support vector machines , 2006, Pattern Recognit..

[11]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[12]  Elif Derya Übeyli,et al.  Automatic detection of erythemato-squamous diseases using adaptive neuro-fuzzy inference systems , 2004, Comput. Biol. Medicine.

[13]  David Casasent,et al.  Adaptive branch and bound algorithm for selecting optimal features , 2007, Pattern Recognit. Lett..

[14]  Kuan-Cheng Lin,et al.  A Novel Feature Selection Method for Support Vector Machines Using a Lion’s Algorithm , 2014 .

[15]  Loris Nanni,et al.  An ensemble of classifiers for the diagnosis of erythemato-squamous diseases , 2006, Neurocomputing.

[16]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[17]  Elif Derya íbeyli Multiclass support vector machines for diagnosis of erythemato-squamous diseases , 2008 .

[18]  H. Altay Güvenir,et al.  Learning differential diagnosis of erythemato-squamous diseases using voting feature intervals , 1998, Artif. Intell. Medicine.

[19]  Bernhard Pfeifer,et al.  Demoting redundant features to improve the discriminatory ability in cancer data , 2009, J. Biomed. Informatics.

[20]  Lei Liu,et al.  Feature selection with dynamic mutual information , 2009, Pattern Recognit..

[21]  Michel Verleysen Feature selection with mutual information and resampling , 2007 .

[22]  José Ranilla,et al.  A framework for diagnosis of urinary incontinence disease based on scoring measures and automatic classifiers , 2011, Comput. Biol. Medicine.

[23]  H. A Güvenir,et al.  An expert system for the differential diagnosis of erythemato-squamous diseases , 2000 .

[24]  Pasi Luukka,et al.  Similarity classifier with generalized mean applied to medical data , 2006, Comput. Biol. Medicine.

[25]  Mehmet Fatih Akay,et al.  Support vector machines combined with feature selection for breast cancer diagnosis , 2009, Expert Syst. Appl..

[26]  Juanying Xie,et al.  Using support vector machines with a novel hybrid feature selection method for diagnosis of erythemato-squamous diseases , 2011, Expert Syst. Appl..

[27]  Pasi Luukka,et al.  Similarity classifier using similarity measure derived from Yu's norms in classification of medical data sets , 2007, Comput. Biol. Medicine.

[28]  Kemal Polat,et al.  A novel hybrid intelligent method based on C4.5 decision tree classifier and one-against-all approach for multi-class classification problems , 2009, Expert Syst. Appl..

[29]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[30]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[31]  Kemal Polat,et al.  The effect to diagnostic accuracy of decision tree classifier of fuzzy and k-NN based weighted pre-processing methods to diagnosis of erythemato-squamous diseases , 2006, Digit. Signal Process..

[32]  Ömer Nezih Gerek,et al.  The search for optimal feature set in power quality event classification , 2009, Expert Syst. Appl..

[33]  Giovanna Castellano,et al.  Diagnosis of dermatological diseases by a neuro-fuzzy system , 2003, EUSFLAT Conf..