An Empirical Evaluation of Feature Selection Methods

The key objective of the chapter would be to study the classification accuracy, using feature selection with machine learning algorithms. The dimensionality of the data is reduced by implementing Feature selection and accuracy of the learning algorithm improved. We test how an integrated feature selection could affect the accuracy of three classifiers by performing feature selection methods. The filter effects show that Information Gain (IG), Gain Ratio (GR) and Relief-f, and wrapper effect show that Bagging and Naive Bayes (NB), enabled the classifiers to give the highest escalation in classification accuracy about the average while reducing the volume of unnecessary attributes. The achieved conclusions can advise the machine learning users, which classifier and feature selection methods to use to optimize the classification accuracy, and this can be important, especially at risk-sensitive applying Machine Learning whereas in the one of the aim to reduce costs of collecting, processing and storage of unnecessary data.

[1]  José Manuel Benítez,et al.  Empirical study of feature selection methods based on individual feature evaluation for classification problems , 2011, Expert Syst. Appl..

[2]  Pat Langley,et al.  Estimating Continuous Distributions in Bayesian Classifiers , 1995, UAI.

[3]  Sandro Bimonte,et al.  When Spatial Analysis Meets OLAP: Multidimensional Model and Operators , 2010, Int. J. Data Warehous. Min..

[4]  Georgios Dounias,et al.  Particle swarm optimization for pap-smear diagnosis , 2008, Expert Syst. Appl..

[5]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[6]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[7]  Harry Zhang,et al.  Naive Bayesian Classifiers for Ranking , 2004, ECML.

[8]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[9]  Flávio Bortolozzi,et al.  Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition , 2003, Seventh International Conference on Document Analysis and Recognition, 2003. Proceedings..

[10]  Yannis Manolopoulos,et al.  Robust Classification Based on Correlations Between Attributes , 2007, Int. J. Data Warehous. Min..

[11]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Luis G. Vargas,et al.  Choosing Data-Mining Methods for Multiple Classification: Representational and Performance Measurement Implications for Decision Support , 1999, J. Manag. Inf. Syst..

[14]  Mineichi Kudo,et al.  A comparative evaluation of medium- and large-scale feature selectors for pattern classifiers , 1998, Kybernetika.

[15]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[16]  Larry A. Rendell,et al.  A Practical Approach to Feature Selection , 1992, ML.

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[18]  Geoff Holmes,et al.  Benchmarking Attribute Selection Techniques for Discrete Class Data Mining , 2003, IEEE Trans. Knowl. Data Eng..

[19]  Mo Adam Mahmood,et al.  Special Issue: Impacts of Information Technology Investment on Organizational Performance , 2000, J. Manag. Inf. Syst..

[20]  Pat Langley,et al.  Selection of Relevant Features and Examples in Machine Learning , 1997, Artif. Intell..

[21]  Thomas G. Dietterich Machine-Learning Research , 1997, AI Mag..

[22]  Sandro Bimonte,et al.  An UML Profile and SOLAP Datacubes Multidimensional Schemas Transformation Process for Datacubes Risk-Aware Design , 2015, Int. J. Data Warehous. Min..

[23]  William H. Press,et al.  Numerical recipes in C (2nd ed.): the art of scientific computing , 1992 .

[24]  Sri Ramakrishna,et al.  FEATURE SELECTION METHODS AND ALGORITHMS , 2011 .

[25]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[26]  Manas Ranjan Patra,et al.  A Comparative Study of Data Mining Algorithms for Network Intrusion Detection , 2008, 2008 First International Conference on Emerging Trends in Engineering and Technology.

[27]  Pedro M. Domingos,et al.  Tree Induction for Probability-Based Ranking , 2003, Machine Learning.

[28]  Francesc J. Ferri,et al.  Comparative study of techniques for large-scale feature selection* *This work was suported by a SERC grant GR/E 97549. The first author was also supported by a FPI grant from the Spanish MEC, PF92 73546684 , 1994 .