A Hybrid Feature Selection Method for Classification Purposes

This paper presents a novel combination of filter features selection algorithms for classification problem. Feature selection is one of the most important issues in pattern recognition, machine learning and computer vision. The main objective of feature selection regards the dimensionality reduction, the performance of machine learning improvement and the process comprehensibility increase. Exhaustive search method is the only method which guarantees to find the optimal subsets but its computational time complexity is exponential. In this paper the set of available variables are firstly reduced using a combination of filter selection methods and then exhaustive search is performed in order to obtain a sub-optimal set of variables in a reasonable time. The proposed approach is tested on several commonly used datasets from UCI repository and two datasets coming from industrial context.

[1]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[2]  Marco Vannucci,et al.  General Purpose Input Variables Extraction: A Genetic Algorithm Based Procedure GIVE A GAP , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[3]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[4]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[5]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[6]  Yoshua Bengio,et al.  Pattern Recognition and Neural Networks , 1995 .

[7]  E. M. Wright,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[8]  Marco Vannucci,et al.  A fuzzy logic-based method for outliers detection , 2007, Artificial Intelligence and Applications.

[9]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[10]  Colla Valentina,et al.  A Fuzzy System for Combining Different Outliers Detection Methods , 2009 .

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Marco Vannucci,et al.  Novel classification method for sensitive problems and uneven datasets based on neural networks and fuzzy logic , 2011, Appl. Soft Comput..

[13]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[14]  Colla Valentina,et al.  Variable selection through Genetic algorithms for classification purposes , 2010 .

[15]  Marco Vannucci,et al.  A Procedure for Building Reduced reliable Training Datasets from Real-World Data , 2014 .

[16]  Marco Vannucci,et al.  Variable Selection and Feature Extraction Through Artificial Intelligence Techniques , 2013 .

[17]  Leonardo Maria Reyneri,et al.  A Method to Point Out Anomalous Input-Output Patterns in a Database for Training Neuro-Fuzzy System with a Supervised Learning Rule , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[18]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[19]  C. A. Murthy,et al.  Unsupervised Feature Selection Using Feature Similarity , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Ron Kohavi,et al.  Wrappers for feature selection , 1997 .

[21]  Jiawei Han,et al.  Generalized Fisher Score for Feature Selection , 2011, UAI.

[22]  Marco Vannucci,et al.  Novel resampling method for the classification of imbalanced datasets for industrial and other real-world problems , 2011, 2011 11th International Conference on Intelligent Systems Design and Applications.

[23]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[24]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[25]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[26]  Marco Vannucci,et al.  Artificial intelligence techniques for unbalanced datasets in real world classification tasks , 2011 .

[27]  Alexey Tsymbal,et al.  Advanced local feature selection in medical diagnostics , 2000, Proceedings 13th IEEE Symposium on Computer-Based Medical Systems. CBMS 2000.

[28]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[29]  Donald Sofge,et al.  Improved Neural Modeling of Real-World Systems Using Genetic Algorithm Based Variable Selection , 2007, ArXiv.

[30]  E. R. Cohen An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements , 1998 .

[31]  Sean R Eddy,et al.  What is dynamic programming? , 2004, Nature Biotechnology.

[32]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[33]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[34]  Ji Zhu,et al.  Variable Selection for Model‐Based High‐Dimensional Clustering and Its Application to Microarray Data , 2008, Biometrics.

[35]  Licheng Jiao,et al.  Multi-layer Perceptrons with Embedded Feature Selection with Application in Cancer Classification ∗ , 2006 .

[36]  Richard Weber,et al.  A wrapper method for feature selection using Support Vector Machines , 2009, Inf. Sci..

[37]  K. Lee COMBINING MULTIPLE FEATURE SELECTION METHODS , 2002 .

[38]  Marco Vannucci,et al.  A method for resampling imbalanced datasets in binary classification tasks for real-world problems , 2014, Neurocomputing.

[39]  Marco Vannucci,et al.  A Genetic Algorithm-Based Approach for Selecting Input Variables and Setting Relevant Network Parameters of a SOM-Based Classifier , 2020, International journal of simulation: systems, science & technology.