Feature subset selection for irrelevant data removal using Decision Tree Algorithm

Feature subset selection is an effective way for reducing dimensionality, removing irrelevant data, and improving result accuracy. Feature subset selection can be viewed as the process of identifying and removing as many irrelevant and redundant features as possible. This is because 1) irrelevant features do not contribute to the predictive accuracy and 2) redundant features do not redound to getting a better predictor for that they provide mostly information which is already present in other feature(s). Irrelevant features, along with redundant features, severely affect the accuracy of the learning machines. In this paper, exceptional vigilance is made on characteristic assortment for classification with data. Here an algorithm is utilized that plans attributes founded on their significance. Then, the organized attributes can be utilized as input one easy algorithm for building decision tree (Oblivious Tree). Outcomes show that this decision tree uses boasted chosen by suggested algorithm outperformed conclusion tree without feature selection. From the experimental outcomes, it is observed that, this procedure develops lesser tree having an agreeable accuracy. The results obtained with decision tree method for selection of datasets has resulted with 85.87% when compared with other techniques.

[1]  Lluís A. Belanche Muñoz,et al.  Feature selection algorithms: a survey and experimental evaluation , 2002, 2002 IEEE International Conference on Data Mining, 2002. Proceedings..

[2]  Luis Talavera,et al.  Feature Selection as a Preprocessing Step for Hierarchical Clustering , 1999, ICML.

[3]  Paul E. Utgoff,et al.  Randomized Variable Elimination , 2002, J. Mach. Learn. Res..

[4]  Sanmay Das,et al.  Filters, Wrappers and a Boosting-Based Hybrid for Feature Selection , 2001, ICML.

[5]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[6]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[7]  Jihoon Yang,et al.  Feature Subset Selection Using a Genetic Algorithm , 1998, IEEE Intell. Syst..

[8]  Naftali Tishby,et al.  Discriminative Feature Selection via Multiclass Variable Memory Markov Model , 2002, EURASIP J. Adv. Signal Process..

[9]  Qinbao Song,et al.  A Fast Clustering-Based Feature Subset Selection Algorithm for High-Dimensional Data , 2013, IEEE Transactions on Knowledge and Data Engineering.

[10]  Haleh Vafaie,et al.  Feature Selection Methods: Genetic Algorithms vs. Greedy-like Search , 2009 .

[11]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[12]  Maciej Modrzejewski,et al.  Feature Selection Using Rough Sets Theory , 1993, ECML.

[13]  Christos Faloutsos,et al.  Data-driven evolution of data mining algorithms , 2002, CACM.

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[16]  Juyang Weng,et al.  Efficient content-based image retrieval using automatic feature selection , 1995, Proceedings of International Symposium on Computer Vision - ISCV.

[17]  Huan Liu,et al.  A selective sampling approach to active feature selection , 2004, Artif. Intell..

[18]  Naftali Tishby,et al.  Distributional Clustering of English Words , 1993, ACL.