A Meta-Heuristic Regression-Based Feature Selection for Predictive Analytics

A high-dimensional feature selection having a very large number of features with an optimal feature subset is an NP-complete problem. Because conventional optimization techniques are unable to tackle large-scale feature selection problems, meta-heuristic algorithms are widely used. In this paper, we propose a particle swarm optimization technique while utilizing regression techniques for feature selection. We then use the selected features to classify the data. Classification accuracy is used as a criterion to evaluate classifier performance, and classification is accomplished through the use of k-nearest neighbour (KNN) and Bayesian techniques. Various high dimensional data sets are used to evaluate the usefulness of the proposed approach. Results show that our approach gives better results when compared with other conventional feature selection algorithms.

[1]  D. Agrafiotis,et al.  Feature selection for structure-activity correlation using binary particle swarms. , 2002, Journal of medicinal chemistry.

[2]  Russell C. Eberhart,et al.  A discrete binary version of the particle swarm algorithm , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[3]  Farid Melgani,et al.  Classification of Electrocardiogram Signals With Support Vector Machines and Particle Swarm Optimization , 2008, IEEE Transactions on Information Technology in Biomedicine.

[4]  Andries P. Engelbrecht,et al.  Computational Intelligence: An Introduction , 2002 .

[5]  Xiangyang Wang,et al.  Feature selection based on rough sets and particle swarm optimization , 2007, Pattern Recognit. Lett..

[6]  Li-Yeh Chuang,et al.  Boolean binary particle swarm optimization for feature selection , 2008, 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence).

[7]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[8]  Aboul Ella Hassanien,et al.  Detection of heart disease using binary particle swarm optimization , 2012, 2012 Federated Conference on Computer Science and Information Systems (FedCSIS).

[9]  Yogesh R. Shepal A Fast Clustering-Based Feature Subset Selection Algorithm for High Dimensional Data , 2014 .

[10]  Pat Langley,et al.  An Analysis of Bayesian Classifiers , 1992, AAAI.

[11]  Hans-Peter Kriegel,et al.  A survey on unsupervised outlier detection in high‐dimensional numerical data , 2012, Stat. Anal. Data Min..

[12]  Inés María Galván,et al.  AMPSO: A New Particle Swarm Method for Nearest Neighborhood Classification , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[13]  George D. C. Cavalcanti,et al.  An approach to feature selection for keystroke dynamics systems based on PSO and feature weighting , 2007, 2007 IEEE Congress on Evolutionary Computation.

[14]  Paul M. Weaver,et al.  Analysis and benchmarking of meta-heuristic techniques for lay-up optimization , 2010 .

[15]  Max A. Little,et al.  Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests , 2009 .

[16]  Li-Yeh Chuang,et al.  Improved binary particle swarm optimization using catfish effect for feature selection , 2011, Expert Syst. Appl..

[17]  Yvan Saeys,et al.  Java-ML: A Machine Learning Library , 2009, J. Mach. Learn. Res..

[18]  Georgios Dounias,et al.  Particle swarm optimization for pap-smear diagnosis , 2008, Expert Syst. Appl..

[19]  Ya-Ju Fan,et al.  Optimizing feature selection to improve medical diagnosis , 2010, Ann. Oper. Res..

[20]  Mengjie Zhang,et al.  A Particle Swarm Optimisation Based Multi-objective Filter Approach to Feature Selection for Classification , 2012, PRICAI.

[21]  Russell C. Eberhart,et al.  Evolutionary Computation Theory and Paradigms , 2001 .

[22]  K.S. Nikita,et al.  Classification of medical data with a robust multi-level combination scheme , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[23]  Hans-Peter Kriegel,et al.  Can Shared-Neighbor Distances Defeat the Curse of Dimensionality? , 2010, SSDBM.

[24]  Kun-Huang Chen,et al.  A new particle swarm feature selection method for classification , 2013, Journal of Intelligent Information Systems.

[25]  Xin-She Yang,et al.  BBA: A Binary Bat Algorithm for Feature Selection , 2012, 2012 25th SIBGRAPI Conference on Graphics, Patterns and Images.

[26]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[27]  Michel Verleysen,et al.  The Curse of Dimensionality in Data Mining and Time Series Prediction , 2005, IWANN.

[28]  Hans-Peter Kriegel,et al.  Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering , 2009, TKDD.

[29]  Max A. Little,et al.  Accurate Telemonitoring of Parkinson's Disease Progression by Noninvasive Speech Tests , 2009, IEEE Transactions on Biomedical Engineering.

[30]  Guo-Chang Gu,et al.  Research on particle swarm optimization: a review , 2004, Proceedings of 2004 International Conference on Machine Learning and Cybernetics (IEEE Cat. No.04EX826).

[31]  Peter L. Hammer,et al.  Logical analysis of data—An overview: From combinatorial optimization to medical applications , 2006, Ann. Oper. Res..

[32]  K. Saastamoinen,et al.  Medical Data Classification using Logical Similarity Based Measures , 2006, 2006 IEEE Conference on Cybernetics and Intelligent Systems.