A novel filter feature selection algorithm based on relief

The Relief algorithm is a feature selection algorithm that uses the nearest neighbor to weight attributes. However, Relief only considers the correlation between features, which leads to a low classification accuracy on noisy datasets whose interaction effect is weak. To overcome the weaknesses of Relief, a novel feature selection algorithm, named Multidirectional Relief (MRelief), is proposed. The MRelief algorithm includes four improvements. First, the multidirectional neighbor search method, which finds all neighbors within a distance threshold from different orientations, is included to obtain regularly distributed neighbors. Therefore, the weights provided by MRelief are more accurate than those provided by Relief. Second, a novel objective function that incorporates the instances’ force coefficients is introduced to reduce the influence of noise. Thus, the new objective function improves the classification accuracy of MRelief. Third, subset generation is introduced to the MRelief algorithm and combined with the maximum Pearson maximum distance (MPMD) to generate a promising candidate subset for feature selection. Finally, a novel multiclass margin definition is proposed and introduced to the MRelief algorithm to handle multiclass data. As demonstrated by extensive experiments on eleven UCI datasets and eleven real-world gene expression benchmarking datasets, MRelief is significantly better than other algorithms including LPLIR, ReliefF, LLH-Relief, MultiSURF, MSLIR-NN, MRMR, MPMD and STIR in our study.

[1]  Xiaoyi Jiang,et al.  GMDH-based semi-supervised feature selection for customer classification , 2017, Knowl. Based Syst..

[2]  Hui Chen,et al.  A whale optimization algorithm with chaos mechanism based on quasi-opposition for global optimization problems , 2020, Expert Syst. Appl..

[3]  Seddik Belkoura,et al.  Fostering interpretability of data mining models through data perturbation , 2019, Expert Syst. Appl..

[4]  Sebastián Ventura,et al.  Scalable extensions of the ReliefF algorithm for weighting and selecting features on the multi-label learning context , 2015, Neurocomputing.

[5]  Dong Hwa Kim,et al.  A hybrid genetic algorithm and bacterial foraging approach for global optimization , 2007, Inf. Sci..

[6]  Gang Wang,et al.  A novel hybrid algorithm for feature selection , 2018, Personal and Ubiquitous Computing.

[7]  Adnan Yazici,et al.  RELIEF-MM: effective modality weighting for multimedia information retrieval , 2014, Multimedia Systems.

[8]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[9]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Hao Dong,et al.  An improved particle swarm optimization for feature selection , 2011 .

[11]  Mohamed Abdel-Basset,et al.  A novel Whale Optimization Algorithm integrated with Nelder-Mead simplex for multi-objective optimization problems , 2021, Knowl. Based Syst..

[12]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[13]  Yuming Chu,et al.  Fixed-time stochastic outer synchronization in double-layered multi-weighted coupling networks with adaptive chattering-free control , 2020, Neurocomputing.

[14]  P. K. Dash,et al.  An improved cuckoo search based extreme learning machine for medical data classification , 2015, Swarm Evol. Comput..

[15]  Jiahao Fan,et al.  A Novel Simplification Method for 3D Geometric Point Cloud Based on the Importance of Point , 2019, IEEE Access.

[16]  Jieping Ye,et al.  Efficient nonconvex sparse group feature selection via continuous and discrete optimization , 2015, Artif. Intell..

[17]  Randal S. Olson,et al.  Relief-Based Feature Selection: Introduction and Review , 2017, J. Biomed. Informatics.

[18]  Jason H. Moore,et al.  STatistical Inference Relief (STIR) feature selection , 2018, bioRxiv.

[19]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[20]  Tom M. Mitchell,et al.  Learning to construct knowledge bases from the World Wide Web , 2000, Artif. Intell..

[21]  Jiahao Fan,et al.  A Hybrid Improved Dragonfly Algorithm for Feature Selection , 2020, IEEE Access.

[22]  Aijun Chen,et al.  Rapid building detection using machine learning , 2016, Applied Intelligence.

[23]  Jason H. Moore,et al.  Spatially Uniform ReliefF (SURF) for computationally-efficient filtering of gene-gene interactions , 2009, BioData Mining.

[24]  Dervis Karaboga,et al.  A comprehensive survey: artificial bee colony (ABC) algorithm and applications , 2012, Artificial Intelligence Review.

[25]  Jörg Kindermann,et al.  Text Categorization with Support Vector Machines. How to Represent Texts in Input Space? , 2002, Machine Learning.

[26]  Bill C. White,et al.  ReliefSeq: A Gene-Wise Adaptive-K Nearest-Neighbor Feature Selection Tool for Finding Gene-Gene Interactions and Main Effects in mRNA-Seq Gene Expression Data , 2013, PloS one.

[27]  Li Zhang,et al.  Local preserving logistic I-Relief for semi-supervised feature selection , 2020, Neurocomputing.

[28]  Weida Zhou,et al.  Logistic local hyperplane-Relief: A feature weighting method for classification , 2019, Knowl. Based Syst..

[29]  Gang Wang,et al.  A Novel Hybrid Algorithm for Feature Selection Based on Whale Optimization Algorithm , 2019, IEEE Access.

[30]  Ronald L. Rivest,et al.  Training a 3-node neural network is NP-complete , 1988, COLT '88.

[31]  Ping Zhang,et al.  Feature selection considering weighted relevancy , 2018, Applied Intelligence.

[32]  Hossein Nezamabadi-pour,et al.  An advanced ACO algorithm for feature subset selection , 2015, Neurocomputing.

[33]  Qiang Lin,et al.  Distributed learning for supervised multiview feature selection , 2020, Applied Intelligence.

[34]  Li Zhang,et al.  Multi-class Semi-supervised Logistic I-RELIEF Feature Selection Based on Nearest Neighbor , 2019, PAKDD.

[35]  Randal S. Olson,et al.  Benchmarking Relief-Based Feature Selection Methods , 2017, J. Biomed. Informatics.

[36]  Alireza Nazemi,et al.  A neural network method for solving support vector classification problems , 2015, Neurocomputing.

[37]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[38]  Hossam Faris,et al.  Binary dragonfly optimization for feature selection using time-varying transfer functions , 2018, Knowl. Based Syst..

[39]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[40]  Yijun Sun,et al.  Iterative RELIEF for Feature Weighting: Algorithms, Theories, and Applications , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  S T Roweis,et al.  Nonlinear dimensionality reduction by locally linear embedding. , 2000, Science.

[42]  Gang Wang,et al.  A novel bacterial foraging optimization algorithm for feature selection , 2017, Expert Syst. Appl..

[43]  Millie Pant,et al.  Link based BPSO for feature selection in big data text clustering , 2017, Future Gener. Comput. Syst..