Analysis of Feature Selection Algorithms on Classification: A Survey

1. INTRODUCTIONmining is a process of knowledge discovery. The KDD is an automated process of knowledge discovery from the original data. The KDD consists of many steps like data cleaning, data integration, data selection, data transformation, data mining, pattern evaluation and knowledge representation. Among the steps the data selection is very much important to select the relevant feature and remove the irrelevant attributes. Classification is one of the datamining techniques used to discover the unknown class. The different classification methods in data mining are Bayesian classification (Statistical classifier), Decision tree induction, and Rule based classification (IF THEN Rule), Classification using Back propagation (Neural network algorithm), Support vector machine, Classification using Association Rule, k-nearest neighbor classifiers, casebased reasoning classifiers, Rough set approach, Genetic algorithm, Fuzzy set approach.

[1]  Pasi Luukka,et al.  Feature selection using fuzzy entropy measures with similarity classifier , 2011, Expert Syst. Appl..

[2]  Jianping Li,et al.  A multiple kernel support vector machine scheme for feature selection and rule extraction from gene expression data of cancer tissue , 2007, Artif. Intell. Medicine.

[3]  Dayou Liu,et al.  A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis , 2011, Expert Syst. Appl..

[4]  Punithavalli,et al.  DISTINGUISHABILITY BASED WEIGHTED FEATURE SELECTION USING COLUMN WISE K NEIGHBORHOOD FOR THE CLASSIFICATION OF GENE MICROARRAY DATASET , 2014 .

[5]  S. Baskar,et al.  A novel information theoretic-interact algorithm (IT-IN) for feature selection using three machine learning algorithms , 2010, Expert Syst. Appl..

[6]  P. Gayathri,et al.  Effective Analysis and Predictive Model of Stroke Disease using Classification Methods , 2012 .

[7]  Sulabha S. Apte,et al.  Improved Study of Heart Disease Prediction System using Data Mining Classification Techniques , 2012 .

[8]  Tzu-Tsung Wong,et al.  A hybrid discretization method for naïve Bayesian classifiers , 2012, Pattern Recognit..

[9]  C. Deisy,et al.  Efficient Dimensionality Reduction Approaches for Feature Selection , 2007, International Conference on Computational Intelligence and Multimedia Applications (ICCIMA 2007).

[10]  T. Warren Liao,et al.  Medical data mining by fuzzy modeling with selected features , 2008, Artif. Intell. Medicine.

[11]  Alicja Wakulicz-Deja,et al.  Selection of Important Attributes for Medical Diagnosis Systems , 2007, Trans. Rough Sets.

[12]  Pradipta Maji,et al.  Rough set based gene selection algorithm for microarray sample classification , 2010, 2010 International Conference on Methods and Models in Computer Science (ICM2CS-2010).

[13]  Sejong Oh,et al.  A novel divide-and-merge classification for high dimensional datasets , 2013, Comput. Biol. Chem..

[14]  Huan Liu,et al.  A selective sampling approach to active feature selection , 2004, Artif. Intell..

[15]  Sri Ramakrishna,et al.  FEATURE SELECTION METHODS AND ALGORITHMS , 2011 .

[16]  Jagath C. Rajapakse,et al.  Gene and sample selection for cancer classification with support vectors based t-statistic , 2010, Neurocomputing.

[17]  Jose Miguel Puerta,et al.  Speeding up incremental wrapper feature subset selection with Naive Bayes classifier , 2014, Knowl. Based Syst..

[18]  Gang Wang,et al.  An efficient diagnosis system for detection of Parkinson's disease using fuzzy k-nearest neighbor approach , 2013, Expert Syst. Appl..

[19]  Tzu-Tsung Wong,et al.  Individual attribute prior setting methods for naïve Bayesian classifiers , 2011, Pattern Recognit..

[20]  Richard Weber,et al.  Simultaneous feature selection and classification using kernel-penalized support vector machines , 2011, Inf. Sci..

[21]  Anirban Mukherjee,et al.  Cancer Classification from Gene Expression Data by NPPC Ensemble , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[22]  Richi Nayak,et al.  The Use of Various Data Mining and Feature Selection Methods in the Analysis of a Population Survey Dataset , 2007, AIDM.

[23]  Pasi Luukka,et al.  Feature selection using Yu's similarity measure and fuzzy entropy measures , 2012, 2012 IEEE International Conference on Fuzzy Systems.

[24]  Umamaheswari Govindaswamy,et al.  NOVEL PREPROCESSING TECHNIQUE IN THE COMPUTER AIDED DETECTION OF BREAST CANCER , 2012 .

[25]  Adam C. Winstanley,et al.  Invariant optimal feature selection: A distance discriminant and feature ranking based solution , 2008, Pattern Recognit..

[26]  Sheila Anand,et al.  Analysis of SEER Dataset for Breast Cancer Diagnosis using C4.5 Classification Algorithm , 2012 .

[27]  Der-Chiang Li,et al.  A fuzzy-based data transformation for feature extraction to increase classification performance with small medical data sets , 2011, Artif. Intell. Medicine.

[28]  L. Ladha,et al.  FEATURE SELECTION METHODS AND ALGORITHMS , 2011 .

[29]  Liangxiao Jiang,et al.  Not so greedy: Randomly Selected Naive Bayes , 2012, Expert Syst. Appl..

[30]  Norman D. Black,et al.  Feature selection and classification model construction on type 2 diabetic patients' data , 2007, Artif. Intell. Medicine.

[31]  Mark A. Hall,et al.  A decision tree-based attribute weighting filter for naive Bayes , 2006, Knowl. Based Syst..