Multilabel feature selection using ML-ReliefF and neighborhood mutual information for multilabel neighborhood decision systems

Abstract Feature selection as an essential preprocessing step in multilabel classification has been widely researched. Due to the diversity and complexity of multilabel datasets, some feature selection methods are unstable and yield low predictive accuracy. To address these issues, this paper presents a novel multilabel feature selection method using multilabel ReliefF (ML-ReliefF) and neighborhood mutual information in multilabel neighborhood decision systems. First, to solve the problem of the few available randomly selected samples when searching the nearest samples in ReliefF, the coefficient of difference and the average distance among the nearest similar and heterogeneous samples are introduced to evaluate the differences among the samples, and then the average differences among the similar or heterogeneous samples are defined. Using the Jaccard correlation coefficient, a new formula for updating feature weights is presented. Second, the margin of the sample is studied to granulate all samples under each label, and the concept of the neighborhood is given. By combining algebra with information views, some neighborhood entropy-based uncertainty measures for multilabel classification are investigated, and new neighborhood mutual information is proposed. Furthermore, an optimization objective function is constructed to evaluate the candidate features in multilabel neighborhood decision systems, all the properties are discussed, and the relationships of these measures are established. Finally, an improved ML-ReliefF algorithm is designed for preliminarily eliminating unrelated features to decrease the computational complexity for multilabel classification, and a heuristic forward multilabel feature selection algorithm is developed to remove redundant features and improve classification performance. Experimental results conducted on thirteen multilabel datasets to verify the effectiveness of the proposed algorithms in multilabel neighborhood decision systems are compared with representative methods.

[1]  Dae-Won Kim,et al.  Optimization approach for feature selection in multi-label classification , 2017, Pattern Recognit. Lett..

[2]  Ju-Sheng Mi,et al.  A novel approach for learning label correlation with application to feature selection of multi-label data , 2020, Inf. Sci..

[3]  Min-Ling Zhang,et al.  Feature-Induced Labeling Information Enrichment for Multi-Label Learning , 2018, AAAI.

[4]  Ju-Sheng Mi,et al.  Optimal granulation selection for multi-label data based on multi-granulation rough sets , 2018, Granular Computing.

[5]  Hamido Fujita,et al.  Supervised information granulation strategy for attribute reduction , 2020, Int. J. Mach. Learn. Cybern..

[6]  Rui Huang,et al.  Manifold-based constraint Laplacian score for multi-label feature selection , 2018, Pattern Recognit. Lett..

[7]  Yonghong Xie,et al.  An Improved Multi-label Relief Feature Selection Algorithm for Unbalanced Datasets , 2017 .

[8]  Shunxiang Wu,et al.  Feature selection for multi-label learning based on kernelized fuzzy rough sets , 2018, Neurocomputing.

[9]  Zhiming Luo,et al.  Manifold regularized discriminative feature selection for multi-label learning , 2019, Pattern Recognit..

[10]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[11]  Yi Yang,et al.  A Convex Formulation for Semi-Supervised Multi-Label Feature Selection , 2014, AAAI.

[12]  Weiping Ding,et al.  Deep Neuro-Cognitive Co-Evolution for Fuzzy Attribute Reduction by Quantum Leaping PSO With Nearest-Neighbor Memeplexes , 2019, IEEE Transactions on Cybernetics.

[13]  Lingyu Xu,et al.  Multi-label feature selection algorithm based on label pairwise ranking comparison transformation , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[14]  Lin Sun,et al.  Feature selection using neighborhood entropy-based uncertainty measures for gene expression data classification , 2019, Inf. Sci..

[15]  Jiucheng Xu,et al.  Feature Selection Using Fuzzy Neighborhood Entropy-Based Uncertainty Measures for Fuzzy Neighborhood Multigranulation Rough Sets , 2021, IEEE Transactions on Fuzzy Systems.

[16]  Xindong Wu,et al.  Online streaming feature selection using adapted Neighborhood Rough Set , 2019, Inf. Sci..

[17]  Sebastián Ventura,et al.  Scalable extensions of the ReliefF algorithm for weighting and selecting features on the multi-label learning context , 2015, Neurocomputing.

[18]  Qinghua Hu,et al.  Streaming Feature Selection for Multilabel Learning Based on Fuzzy Mutual Information , 2017, IEEE Transactions on Fuzzy Systems.

[19]  Víctor Robles,et al.  Feature selection for multi-label naive Bayes classification , 2009, Inf. Sci..

[20]  Chris H. Q. Ding,et al.  Extended adaptive Lasso for multi-class and multi-label feature selection , 2019, Knowl. Based Syst..

[21]  Degang Chen,et al.  Alignment Based Feature Selection for Multi-label Learning , 2019, Neural Processing Letters.

[22]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[23]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[24]  Lin Sun,et al.  Multilabel Feature Selection Using Relief and Minimum Redundancy Maximum Relevance Based on Neighborhood Rough Sets , 2020, IEEE Access.

[25]  Jinhai Li,et al.  Neighborhood attribute reduction: a multi-criterion approach , 2019, Int. J. Mach. Learn. Cybern..

[26]  Lin Sun,et al.  Joint neighborhood entropy-based gene selection method with fisher score for tumor classification , 2018, Applied Intelligence.

[27]  Lin Sun,et al.  Hybrid Multilabel Feature Selection Using BPSO and Neighborhood Rough Sets for Multilabel Neighborhood Decision Systems , 2019, IEEE Access.

[28]  Hossein Nezamabadi-pour,et al.  A label-specific multi-label feature selection algorithm based on the Pareto dominance concept , 2019, Pattern Recognit..

[29]  Qiang Yang,et al.  Document Transformation for Multi-label Feature Selection in Text Categorization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[30]  Qian Yuhua,et al.  Feature Selection for Multi-Label Classification Based on Neighborhood Rough Sets , 2015 .

[31]  Jie Duan,et al.  Multi-label feature selection based on neighborhood mutual information , 2016, Appl. Soft Comput..

[32]  Dae-Won Kim,et al.  Feature selection for multi-label classification using multivariate mutual information , 2013, Pattern Recognit. Lett..

[33]  Lin Wang,et al.  Attribute reduction based on improved information entropy , 2019, J. Intell. Fuzzy Syst..

[34]  Nicu Sebe,et al.  Web Image Annotation Via Subspace-Sparsity Collaborated Feature Selection , 2012, IEEE Transactions on Multimedia.

[35]  Yuwen Li,et al.  Attribute reduction for multi-label learning with fuzzy rough set , 2018, Knowl. Based Syst..

[36]  Ming Yang,et al.  ReliefF-based Multi-label Feature Selection , 2015 .