A Comparative Study of Filter and Wrapper Methods on EDHS – HIV/AIDS Dataset

Machine learning is currently recognized as an extremely valuable aspect for feature selection. It is a subject often addressed in machine learning tasks particularly when there are many features. It can be a key step towards the successful discovery of knowledge in those issues, where the set of features is massive. Several features selection techniques have been suggested over the last few decades with their unique advantages and drawbacks. Despite the intensive study and the huge amount of effort, the different forms of feature selection methods have not yet been evaluated in the HIV/AIDS situation. The proposed article tries to address the void. Hence, this article presents the empirical comparison of the feature selection method of filter and wrapper in EDHS dataset. This paper examines the adaptation of an existing wrapper and filter methods to perform experiments with six machine learning algorithms and test their performance using accuracy and ROC metrics. The result of this work confirms the effectiveness of feature selection for this type of problem and wrapper method reveals superiority, when compared to filters. In wrapper, the classifiers such as LR, SVM, DT and RF scored a better accuracy than filter, whereas KNN and GB achieved same results in both the methods. In ROC metrics, the wrapper method also scores better result than filter. The final result confirms that, wrapper method has scored better results compared to filter method and the original feature set.

[1]  Mark A. Hall,et al.  Correlation-based Feature Selection for Machine Learning , 2003 .

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[4]  Xindong Wu,et al.  10 Challenging Problems in Data Mining Research , 2006, Int. J. Inf. Technol. Decis. Mak..

[5]  Hu Min,et al.  Filter-Wrapper Hybrid Method on Feature Selection , 2010, 2010 Second WRI Global Congress on Intelligent Systems.

[6]  Edward Y. Chang,et al.  Class-Boundary Alignment for Imbalanced Dataset Learning , 2003 .

[7]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[8]  J. Friedman Greedy function approximation: A gradient boosting machine. , 2001 .

[9]  Johan A. K. Suykens,et al.  Least Squares Support Vector Machine Classifiers , 1999, Neural Processing Letters.

[10]  S. S. Iyengar,et al.  An Evaluation of Filter and Wrapper Methods for Feature Selection in Categorical Clustering , 2005, IDA.

[11]  M. Carmen Garrido,et al.  Feature subset selection Filter-Wrapper based on low quality data , 2013, Expert Syst. Appl..

[12]  J. Friedman Stochastic gradient boosting , 2002 .

[13]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[14]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[15]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[16]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[17]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[18]  Albert Y. Zomaya,et al.  Ensemble-Based Wrapper Methods for Feature Selection and Class Imbalance Learning , 2013, PAKDD.