Multi-label feature selection based on label distribution and feature complementarity

Abstract In the real-world, data in various domains usually tend to be high-dimensional, which may result in considerable time complexity and poor performance for multi-label classification problems. Multi-label feature selection is an important preprocessing step in machine learning, which can effectively solve the so-called “curse of dimensionality” by removing irrelevant and redundant features. Nevertheless, the significance of related labels for each instance is generally different, which is an issue that most of the existing multi-label feature selection algorithms have not addressed. Hence, in this paper, we integrate label-distribution learning into multi-label feature selection from the perspective of granular computing with considering multiple feature correlations. Then, a novel multi-label feature selection algorithm based on label distribution and feature complementarity is developed. In addition, the proposed algorithm consists of two primary parts: first, the different significances of related labels for each instance in the multi-label data are obtained based on granular computing; second, the feature complementarity is estimated based on neighborhood mutual information without discretization. Moreover, the superiority of our proposed method over other state-of-the-art methods is demonstrated by conducting comprehensive experiments with 10 publicly available multi-label datasets on six widely-used metrics. Finally, the proposed method can significantly improve the performance of the classifier while reducing the dimension of the original data.

[1]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[2]  Witold Pedrycz,et al.  Measuring relevance between discrete and continuous features based on neighborhood mutual information , 2011, Expert Syst. Appl..

[3]  Bassam Al-Salemi,et al.  Feature ranking for enhancing boosting-based multi-label text categorization , 2018, Expert Syst. Appl..

[4]  Zixiang Wang,et al.  Ontological function annotation of long non‐coding RNAs through hierarchical multi‐label classification , 2018, Bioinform..

[5]  Stefan Kramer,et al.  Online multi-label dependency topic models for text classification , 2018, Machine Learning.

[6]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[7]  Chien-Li Chou,et al.  Effective Semantic Annotation by Image-to-Concept Distribution Model , 2011, IEEE Transactions on Multimedia.

[8]  Shifei Ding,et al.  Multi layer ELM-RBF for multi-label learning , 2016, Appl. Soft Comput..

[9]  Qinghua Hu,et al.  Hybrid Noise-Oriented Multilabel Learning , 2020, IEEE Transactions on Cybernetics.

[10]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[11]  Mohsen Rahmani,et al.  A recommender system for tourism industry using cluster ensemble and prediction machine learning techniques , 2017, Comput. Ind. Eng..

[12]  Daren Yu,et al.  Neighborhood entropy , 2009, 2009 International Conference on Machine Learning and Cybernetics.

[13]  Hossein Nezamabadi-pour,et al.  Multilabel feature selection: A comprehensive review and guiding experiments , 2018, WIREs Data Mining Knowl. Discov..

[14]  Xin Geng,et al.  Label Distribution Learning , 2013, 2013 IEEE 13th International Conference on Data Mining Workshops.

[15]  Diego Oliva,et al.  An improved runner-root algorithm for solving feature selection problems based on rough sets and neighborhood rough sets , 2020, Appl. Soft Comput..

[16]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Inducing Hierarchical Multi-label Classification rules with Genetic Algorithms , 2019, Appl. Soft Comput..

[17]  Ning Xu,et al.  Label distribution learning and label enhancement , 2018 .

[18]  Shunxiang Wu,et al.  Multi-label learning based on label-specific features and local pairwise label correlation , 2018, Neurocomputing.

[19]  Mehrbakhsh Nilashi,et al.  A multi-criteria collaborative filtering recommender system for the tourism domain using Expectation Maximization (EM) and PCA-ANFIS , 2015, Electron. Commer. Res. Appl..

[20]  Xin Jin,et al.  A Multi-Label Supervised Topic Model Conditioned on Arbitrary Features for Gene Function Prediction , 2019, Genes.

[21]  Mehrbakhsh Nilashi,et al.  Analysis of Travellers’ Online Reviews in Social Networking Sites Using Fuzzy Logic Approach , 2019, International Journal of Fuzzy Systems.

[22]  Mehrbakhsh Nilashi,et al.  A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques , 2018, Expert Syst. Appl..

[23]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Dae-Won Kim,et al.  Optimization approach for feature selection in multi-label classification , 2017, Pattern Recognit. Lett..

[25]  Giuseppe De Pietro,et al.  Deep neural network for hierarchical extreme multi-label text classification , 2019, Appl. Soft Comput..

[26]  Hua Li,et al.  A novel attribute reduction approach for multi-label data based on rough set theory , 2016, Inf. Sci..

[27]  Bassam Al-Salemi,et al.  RFBoost: An improved multi-label boosting algorithm and its application to text categorisation , 2016, Knowl. Based Syst..

[28]  Feiping Nie,et al.  SVM based multi-label learning with missing labels for image annotation , 2018, Pattern Recognit..

[29]  Zhi-Hua Zhou,et al.  Multilabel dimensionality reduction via dependence maximization , 2008, TKDD.

[30]  Amir-Masoud Eftekhari-Moghadam,et al.  Combination of classification and regression in decision tree for multi-labeling image annotation and retrieval , 2013, Appl. Soft Comput..

[31]  Zhiming Luo,et al.  Towards a unified multi-source-based optimization framework for multi-label learning , 2019, Appl. Soft Comput..

[32]  Mingxuan Sun,et al.  A Deep Multi-Modal CNN for Multi-Instance Multi-Label Image Classification , 2018, IEEE Transactions on Image Processing.

[33]  Dae-Won Kim,et al.  Mutual Information-based multi-label feature selection using interaction information , 2015, Expert Syst. Appl..

[34]  Wenyu Liu,et al.  Structured random forest for label distribution learning , 2018, Neurocomputing.

[35]  Nurfadhlina Mohd Sharef,et al.  Preference learning for eco-friendly hotels recommendation: A multi-criteria collaborative filtering approach , 2019, Journal of Cleaner Production.

[36]  Shunxiang Wu,et al.  Online Multi-label Group Feature Selection , 2017, Knowl. Based Syst..

[37]  Ping Zhang,et al.  Distinguishing two types of labels for multi-label feature selection , 2019, Pattern Recognit..

[38]  E. Yadegaridehkordi,et al.  Revealing customers’ satisfaction and preferences through online review analysis: The case of Canary Islands hotels , 2019, Journal of Retailing and Consumer Services.

[39]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[40]  Qingming Huang,et al.  Improving multi-label classification with missing labels by learning label-specific features , 2019, Inf. Sci..

[41]  Qingyao Wu,et al.  Multi-instance multi-label distance metric learning for genome-wide protein function prediction , 2016, Comput. Biol. Chem..

[42]  Zhi-Hua Zhou,et al.  Facial Age Estimation by Learning from Label Distributions , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Witold Pedrycz,et al.  Granular multi-label feature selection based on mutual information , 2017, Pattern Recognit..

[44]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[45]  Parham Moradi,et al.  OSFSMI: Online stream feature selection method based on mutual information , 2017, Appl. Soft Comput..

[46]  Qinghua Hu,et al.  Neighborhood classifiers , 2008, Expert Syst. Appl..

[47]  Qinghua Hu,et al.  Multi-label feature selection based on max-dependency and min-redundancy , 2015, Neurocomputing.

[48]  Jie Duan,et al.  Multi-label feature selection based on neighborhood mutual information , 2016, Appl. Soft Comput..

[49]  Dae-Won Kim,et al.  Feature selection for multi-label classification using multivariate mutual information , 2013, Pattern Recognit. Lett..

[50]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.