Feature Selection on High Dimensional Data Using Wrapper Based Subset Selection

In recent years, feature subset selection and classification in high dimensional data is a major challenge faced by the researchers. The main aim of the feature subset selection is to find most informative features from the vast number of features in the high dimensional data. Filter, wrapper and embedded methods are currently used to solve these issues. In this paper, we have incorporated wrapper based subset selection technique for selecting a subset from the high dimensional datasets. In this approach to find the optimal threshold value, the feature subsets are given to the classifier iteratively until the maximum accuracy is obtained. The symmetrical uncertainty method is used to weight the features to predict the predominant feature. For validating the incorporated algorithm, we have used 10-fold cross validation against the two standard classification techniques such as Naive Bayes and Support Vector Machine (SVM) and the results are tabulated and compared. The comparison between the results shows that the proposed method gives the better accuracy and results.

[1]  Xiao Chen,et al.  A multi-objective heuristic algorithm for gene expression microarray data classification , 2016, Expert Syst. Appl..

[2]  Belén Melián-Batista,et al.  High-dimensional feature selection via feature grouping: A Variable Neighborhood Search approach , 2016, Inf. Sci..

[3]  Thomas Villmann,et al.  Precision-Recall-Optimization in Learning Vector Quantization Classifiers for Improved Medical Classification Systems , 2014, 2014 IEEE Symposium on Computational Intelligence and Data Mining (CIDM).

[4]  Hossein Safari,et al.  A hybrid algorithm for feature subset selection in high-dimensional datasets using FICA and IWSSr algorithm , 2015, Appl. Soft Comput..

[5]  Kyu-Baek Hwang,et al.  An efficient and effective wrapper based on paired t-test for learning naive Bayes classifiers from large-scale domains , 2013 .

[6]  D. Liang,et al.  Comparison of Feature Selection Methods for Cross-Laboratory Microarray Analysis , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[7]  Jane You,et al.  Adaptive Fuzzy Consensus Clustering Framework for Clustering Analysis of Cancer Data , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[8]  Santanu Kumar Rath,et al.  Analysis of microarray leukemia data using an efficient MapReduce-based K-nearest-neighbor classifier , 2016, J. Biomed. Informatics.

[9]  Ashis Pradhan,et al.  SUPPORT VECTOR MACHINE-A Survey , 2012 .

[10]  Wei Liang,et al.  On Efficient Feature Ranking Methods for High-Throughput Data Analysis , 2015, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[11]  Parham Moradi,et al.  Gene selection for microarray data classification using a novel ant colony optimization , 2015, Neurocomputing.

[12]  Richard Weber,et al.  Feature selection for high-dimensional class-imbalanced data sets using Support Vector Machines , 2014, Inf. Sci..

[13]  Mohammad Hossein Moattar,et al.  A hybrid gene selection approach for microarray data classification using cellular learning automata and ant colony optimization. , 2016, Genomics.