Wrapper-Based Feature Selection Using Self-adaptive Differential Evolution

Knowledge discovery in databases is a comprehensive procedure which enables researchers to explore knowledge and information from raw sample data usefully. Some problems may arise during this procedure, for example the Curse of Dimensionality, where the reduction of database is desired to avoid feature redundancy or irrelevancy. In this paper, we propose a wrapper-based feature selection algorithm, consisting of an artificial neural network and self-adaptive differential evolution optimization algorithm. We test performance of the feature selection algorithm on a case study of bank marketing and show that this feature selection algorithm reduces the size of the database and simultaneously improves prediction performance on the observed problem.

[1]  Ali Dehghantanha,et al.  Ensemble-based multi-filter feature selection method for DDoS detection in cloud computing , 2016, EURASIP Journal on Wireless Communications and Networking.

[2]  Dong-Sheng Cao,et al.  Incorporating PLS model information into particle swarm optimization for descriptor selection in QSAR/QSPR , 2015 .

[3]  Huan Liu,et al.  Embedded Unsupervised Feature Selection , 2015, AAAI.

[4]  Drew McDermott,et al.  Introduction to artificial intelligence , 1986, Addison-Wesley series in computer science.

[5]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[6]  Hans-Georg Kemper,et al.  Application-Pull and Technology-Push as Driving Forces for the Fourth Industrial Revolution , 2014 .

[7]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[8]  Janez Brest,et al.  A novel self-adaptive differential evolution for feature selection using threshold mechanism , 2018, 2018 IEEE Symposium Series on Computational Intelligence (SSCI).

[9]  Liana Amaya Moreno,et al.  Cash management cost reduction using data mining to forecast cash demand and LP to optimize resources , 2012, Memetic Computing.

[10]  Huan Liu,et al.  Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution , 2003, ICML.

[11]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[12]  Francisco Herrera,et al.  Data Preprocessing in Data Mining , 2014, Intelligent Systems Reference Library.

[13]  Meng Lu,et al.  Embedded feature selection accounting for unknown data heterogeneity , 2019, Expert Syst. Appl..

[14]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[15]  Fardin Ahmadizar,et al.  A novel multivariate filter method for feature selection in text classification problems , 2018, Eng. Appl. Artif. Intell..

[16]  Richard Bellman,et al.  Adaptive Control Processes: A Guided Tour , 1961, The Mathematical Gazette.

[17]  Charu C. Aggarwal,et al.  Feature Selection for Classification: A Review , 2014, Data Classification: Algorithms and Applications.

[18]  Magdalena Scherer,et al.  Predicting Success of Bank Direct Marketing by Neuro-fuzzy Systems , 2016, ICAISC.

[19]  Said Jadid Abdul Kadir,et al.  Binary Optimization Using Hybrid Grey Wolf Optimization for Feature Selection , 2019, IEEE Access.

[20]  Dan Simon,et al.  Evolutionary Optimization Algorithms , 2013 .

[21]  Rainer Storn,et al.  Differential Evolution – A Simple and Efficient Heuristic for global Optimization over Continuous Spaces , 1997, J. Glob. Optim..

[22]  Janez Brest,et al.  Self-Adapting Control Parameters in Differential Evolution: A Comparative Study on Numerical Benchmark Problems , 2006, IEEE Transactions on Evolutionary Computation.

[23]  Gabor Kereszturi,et al.  Integrating Airborne Hyperspectral, Topographic, and Soil Data for Estimating Pasture Quality Using Recursive Feature Elimination with Random Forest Regression , 2018, Remote. Sens..

[24]  D. Bates,et al.  Big data in health care: using analytics to identify and manage high-risk and high-cost patients. , 2014, Health affairs.

[25]  Songul Kakilli Acaravci,et al.  Using Data Mining Techniques for Detecting the Important Features of the Bank Direct Marketing Data , 2017 .

[26]  A. Elsalamony,et al.  Bank Direct Marketing Based on Neural Network and C5.0 Models , 2013 .

[27]  Cornelio Yáñez-Márquez,et al.  Automatic feature weighting for improving financial Decision Support Systems , 2018, Decis. Support Syst..

[28]  Seyed Mohammad Mirjalili,et al.  Whale optimization approaches for wrapper feature selection , 2018, Appl. Soft Comput..

[29]  Dong-Sheng Cao,et al.  ECoFFeS: A Software Using Evolutionary Computation for Feature Selection in Drug Discovery , 2018, IEEE Access.

[30]  Uma Srinivasan,et al.  Leveraging Big Data Analytics to Reduce Healthcare Costs , 2013, IT Professional.

[31]  Paulo Cortez,et al.  A data-driven approach to predict the success of bank telemarketing , 2014, Decis. Support Syst..

[32]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[33]  MengChu Zhou,et al.  An embedded feature selection method for imbalanced data classification , 2019, IEEE/CAA Journal of Automatica Sinica.

[34]  Manjusha Pandey,et al.  Analyzing Student Performance Using Data Mining , 2019 .

[35]  Michal Pluhacek,et al.  Distance based parameter adaptation for Success-History based Differential Evolution , 2019, Swarm Evol. Comput..

[36]  Enrique Alba,et al.  Two hybrid wrapper-filter feature selection algorithms applied to high-dimensional microarray experiments , 2016, Appl. Soft Comput..

[37]  B. Agard,et al.  Selection of modules for mass customisation , 2010 .

[38]  Ferat Sahin,et al.  A survey on feature selection methods , 2014, Comput. Electr. Eng..

[39]  Xin Yao,et al.  A Survey on Evolutionary Computation Approaches to Feature Selection , 2016, IEEE Transactions on Evolutionary Computation.