A Fuzzy System for Combining Filter Features Selection Methods

Feature selection is considered as one of the most important data pre-processing step in different modelling fields, especially for prediction and classification purposes. Feature selection belongs to the wider class of data mining procedures, as it allows to discover the variables that mostly affect a given phenomenon from an analysis of the available data, by thus increasing the knowledge of the considered process or phenomenon. There are three main categories of feature selection approaches, namely filter, wrappers and embedded methods: this work is focused on the first one and, in particular, on a fuzzy logic-based procedure which combines some traditional filter methods. Filter methods exploit intrinsic properties of the data to select the features before the learning task and, with respect to the other kinds of approaches, require a shorter computational time and adequate for datasets with a large number of instances and features. In order to prove the effectiveness of the proposed approach, several tests have been performed. Different classifiers have been designed and applied for binary classification on different datasets: some widely used public datasets including a lot of instances and features and two datasets coming from the metal industry. The obtained results are presented and discussed in the paper.

[1]  Jesús S. Aguilar-Ruiz,et al.  Fast feature selection aimed at high-dimensional data via hybrid-sequential-ranked searches , 2012, Expert Syst. Appl..

[2]  Chih-Jen Lin,et al.  Combining SVMs with Various Feature Selection Strategies , 2006, Feature Extraction.

[3]  Ron Kohavi,et al.  Wrappers for feature selection , 1997 .

[4]  Valentina Colla,et al.  Improving the stability of wrapper variable selection applied to binary classification , 2016, CISIM 2016.

[5]  Wenyi Wang,et al.  Bayesian variable selection for binary outcomes in high-dimensional genomic studies using non-local priors , 2016, Bioinform..

[6]  Jinyan Li,et al.  DATA MINING TECHNIQUES FOR THE PRACTICAL BIOINFORMATICIAN , 2004 .

[7]  Huan Liu,et al.  Feature Selection: An Ever Evolving Frontier in Data Mining , 2010, FSDM.

[8]  George D. C. Cavalcanti,et al.  Data-driven global-ranking local feature selection methods for text categorization , 2015, Expert Syst. Appl..

[9]  Valentina Colla,et al.  A multivariate fuzzy system applied for outliers detection , 2013, J. Intell. Fuzzy Syst..

[10]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[11]  Trevor Hastie,et al.  Regularized Discriminant Analysis and Its Application in Microarrays , 2004 .

[12]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[13]  Ji Zhu,et al.  Variable Selection for Model‐Based High‐Dimensional Clustering and Its Application to Microarray Data , 2008, Biometrics.

[14]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[15]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[16]  Alan D. Carswell,et al.  Network Intrusion Detection Using a HNB Binary Classifier , 2015, 2015 17th UKSim-AMSS International Conference on Modelling and Simulation (UKSim).

[17]  José Ramón Quevedo,et al.  A simple and efficient method for variable ranking according to their usefulness for learning , 2007, Comput. Stat. Data Anal..

[18]  Azuraliza Abu Bakar,et al.  Hybrid feature selection based on enhanced genetic algorithm for text categorization , 2016, Expert Syst. Appl..

[19]  Jiawei Han,et al.  Generalized Fisher Score for Feature Selection , 2011, UAI.

[20]  Shiqing Zhang,et al.  Feature selection filtering methods for emotion recognition in Chinese speech signal , 2008, 2008 9th International Conference on Signal Processing.

[21]  K. Shima,et al.  SVM-based feature selection of latent semantic features , 2004, Pattern Recognit. Lett..

[22]  Zhongyang Fei,et al.  A variable selection aided residual generator design approach for process control and monitoring , 2016, Neurocomputing.

[23]  Melanie Hilario,et al.  Knowledge and Information Systems , 2007 .

[24]  Valentina Colla,et al.  Improving the stability of Sequential Forward variables selection , 2015, 2015 15th International Conference on Intelligent Systems Design and Applications (ISDA).

[25]  Spyros G. Tzafestas,et al.  Fuzzy Reasoning in Information, Decision and Control Systems , 2013 .

[26]  Ebrahim H. Mamdani,et al.  An Experiment in Linguistic Synthesis with a Fuzzy Logic Controller , 1999, Int. J. Hum. Comput. Stud..

[27]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[28]  Wei-Min Shen,et al.  Data Preprocessing and Intelligent Data Analysis , 1997, Intell. Data Anal..

[29]  Bülent Sankur,et al.  Feature selection in the independent component subspace for face recognition , 2004, Pattern Recognit. Lett..

[30]  Neil Davey,et al.  Using Feature Selection Filtering Methods for Binding Site Predictions , 2006, 2006 5th IEEE International Conference on Cognitive Informatics.

[31]  V. Novák,et al.  Mathematical Principles of Fuzzy Logic , 1999 .

[32]  Chong-Ho Choi,et al.  Input feature selection for classification problems , 2002, IEEE Trans. Neural Networks.

[33]  Yi Liu,et al.  FS_SFS: A novel feature selection method for support vector machines , 2006, Pattern Recognit..

[34]  J. Rice Mathematical Statistics and Data Analysis , 1988 .

[35]  Marco Vannucci,et al.  General Purpose Input Variables Extraction: A Genetic Algorithm Based Procedure GIVE A GAP , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[36]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[37]  Ronaldo C. Prati,et al.  Combining feature ranking algorithms through rank aggregation , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[38]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[39]  Richard Nock,et al.  A hybrid filter/wrapper approach of feature selection using information theory , 2002, Pattern Recognit..

[40]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[41]  Özge Uncu,et al.  A novel feature selection approach: Combining feature wrappers and filters , 2007, Inf. Sci..

[42]  Pagavathigounder Balasubramaniam,et al.  GLOBAL ROBUST STABILITY CRITERIA FOR T-S FUZZY SYSTEMS WITH DISTRIBUTED DELAYS AND TIME DELAY IN THE LEAKAGE TERM , 2012 .

[43]  Pagavathigounder Balasubramaniam,et al.  Delay dependent stability analysis of neutral systems with mixed time-varying delays and nonlinear perturbations , 2011, J. Comput. Appl. Math..

[44]  Richard Weber,et al.  A wrapper method for feature selection using Support Vector Machines , 2009, Inf. Sci..

[45]  Jerry M. Mendel,et al.  A fuzzy logic method for modulation classification in nonideal environments , 1999, IEEE Trans. Fuzzy Syst..

[46]  R. Rakkiyappan,et al.  RESEARCH ARTICLE Stochastic sampled data robust stabilisation of T-S fuzzy neutral systems with randomly occurring uncertainties and time-varying delays , 2014 .

[47]  Colla Valentina,et al.  Variable selection through Genetic algorithms for classification purposes , 2010 .

[48]  Trevor Hastie,et al.  Regularized linear discriminant analysis and its application in microarrays. , 2007, Biostatistics.

[49]  Lior Rokach,et al.  Data Mining with Decision Trees - Theory and Applications , 2007, Series in Machine Perception and Artificial Intelligence.

[50]  Ngai Ming Kwok,et al.  Automatic Fuzzy Membership Function Tuning Using the Particle Swarm Optimization , 2008, 2008 IEEE Pacific-Asia Workshop on Computational Intelligence and Industrial Application.

[51]  Wilfried N. Gansterer,et al.  On the Relationship Between Feature Selection and Classification Accuracy , 2008, FSDM.

[52]  Filiberto Pla,et al.  Filter-Type Variable Selection Based on Information Measures for Regression Tasks , 2012, Entropy.

[53]  Peyman Kabiri,et al.  Feature Selection for Intrusion Detection System Using Ant Colony Optimization , 2016, Int. J. Netw. Secur..

[54]  Moshe Kam,et al.  New filter-based feature selection criteria for identifying differentially expressed genes , 2005, Fourth International Conference on Machine Learning and Applications (ICMLA'05).

[55]  B. Porter,et al.  Synthesis of control policies for economic models: a continuous-time multiplier model† , 1970 .

[56]  Mark D. Semon,et al.  POSTUSE REVIEW: An Introduction to Error Analysis: The Study of Uncertainties in Physical Measurements , 1982 .