A comparative study on the effect of feature selection on classification accuracy

Abstract Feature selection has become interest to many research areas which deal with machine learning and data mining, because it provides the classifiers to be fast, cost-effective, and more accurate. In this paper the effect of feature selection on the accuracy of NaiveBayes, Artificial Neural Network as Multilayer Perceptron, and J48 decision tree classifiers is presented. These classifiers are compared with fifteen real datasets which are pre-processed with feature selection methods. Up to 15.55% improvement in classification accuracy is observed, and Multilayer Perceptron appears to be the most sensitive classifier to feature selection.

[1]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[2]  Handan Ankarali,et al.  Veri Madenciliği Yöntemlerine Genel Bakış , 2009 .

[3]  William H. Press,et al.  Numerical recipes in C (2nd ed.): the art of scientific computing , 1992 .

[4]  José Manuel Benítez,et al.  Empirical study of feature selection methods based on individual feature evaluation for classification problems , 2011, Expert Syst. Appl..

[5]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[6]  Nur Izura Udzir,et al.  A Study on Feature Selection and Classification Techniques for Automatic Genre Classification of Traditional Malay Music , 2008, ISMIR.

[7]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[8]  J. Orbach Principles of Neurodynamics. Perceptrons and the Theory of Brain Mechanisms. , 1962 .

[9]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[10]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  Lloyd A. Smith,et al.  Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper , 1999, FLAIRS.

[13]  Jasmina Novakovic The Impact of Feature Selection on the Accuracy of Bayes Classifier , 2010 .

[14]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[15]  W. Press,et al.  Numerical Recipes in C++: The Art of Scientific Computing (2nd edn)1 Numerical Recipes Example Book (C++) (2nd edn)2 Numerical Recipes Multi-Language Code CD ROM with LINUX or UNIX Single-Screen License Revised Version3 , 2003 .

[16]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[17]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[18]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[19]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[20]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[21]  Marko Grobelnik,et al.  Interaction of Feature Selection Methods and Linear Classification Models , 2002 .

[22]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.