Multi-label feature selection based on neighborhood mutual information

Graphical abstractDisplay Omitted HighlightsDifferent from the traditional multi-label feature selection, the proposed algorithm derives from different cognitive viewpoints.A simple and intuitive metric to evaluate the candidate features is proposed.The proposed algorithm is applicable to both categorical and numerical features.Our proposed method outperforms some other state-of-the-art multi-label feature selection methods in our experiments. Multi-label learning deals with data associated with a set of labels simultaneously. Like traditional single-label learning, the high-dimensionality of data is a stumbling block for multi-label learning. In this paper, we first introduce the margin of instance to granulate all instances under different labels, and three different concepts of neighborhood are defined based on different cognitive viewpoints. Based on this, we generalize neighborhood information entropy to fit multi-label learning and propose three new measures of neighborhood mutual information. It is shown that these new measures are a natural extension from single-label learning to multi-label learning. Then, we present an optimization objective function to evaluate the quality of the candidate features, which can be solved by approximating the multi-label neighborhood mutual information. Finally, extensive experiments conducted on publicly available data sets verify the effectiveness of the proposed algorithm by comparing it with state-of-the-art methods.

[1]  Michel Verleysen,et al.  Mutual information-based feature selection for multilabel classification , 2013, Neurocomputing.

[2]  Min Wu,et al.  Multi-label ensemble based on variable pairwise constraint projection , 2013, Inf. Sci..

[3]  Dae-Won Kim,et al.  Feature selection for multi-label classification using multivariate mutual information , 2013, Pattern Recognit. Lett..

[4]  Qinghua Hu,et al.  Multi-label Feature Selection with Fuzzy Rough Sets , 2014, RSKT.

[5]  Zhi-Hua Zhou,et al.  On the Consistency of Multi-Label Learning , 2011, COLT.

[6]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[7]  Qinghua Hu,et al.  An improved attribute reduction scheme with covering based rough sets , 2015, Appl. Soft Comput..

[8]  Lei Wu,et al.  Lift: Multi-Label Learning with Label-Specific Features , 2015, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Weihua Xu,et al.  Granular Computing Approach to Two-Way Learning Based on Formal Concept Analysis in Fuzzy Datasets , 2016, IEEE Transactions on Cybernetics.

[10]  Xindong Wu,et al.  MLSLR: Multilabel Learning via Sparse Logistic Regression , 2014, Inf. Sci..

[11]  Volker Tresp,et al.  Multi-label informed latent semantic indexing , 2005, SIGIR '05.

[12]  Jieping Ye,et al.  A shared-subspace learning framework for multi-label classification , 2010, TKDD.

[13]  Saso Dzeroski,et al.  ReliefF for Hierarchical Multi-label Classification , 2013, NFMCP.

[14]  Jane Labadin,et al.  Feature selection based on mutual information , 2015, 2015 9th International Conference on IT in Asia (CITA).

[15]  Jianmin Zhao,et al.  Multi-label Feature Selection via Information Gain , 2014, ADMA.

[16]  Witold Pedrycz,et al.  Measuring relevance between discrete and continuous features based on neighborhood mutual information , 2011, Expert Syst. Appl..

[17]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[18]  Xindong Wu,et al.  Data mining with big data , 2014, IEEE Transactions on Knowledge and Data Engineering.

[19]  Jianhua Dai,et al.  Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification , 2013, Appl. Soft Comput..

[20]  Víctor Robles,et al.  Feature selection for multi-label naive Bayes classification , 2009, Inf. Sci..

[21]  Naftali Tishby,et al.  Margin based feature selection - theory and algorithms , 2004, ICML.

[22]  Newton Spolaôr,et al.  A Comparison of Multi-label Feature Selection Methods using the Problem Transformation Approach , 2013, CLEI Selected Papers.

[23]  Zhi-Hua Zhou,et al.  Multi-Label Learning by Exploiting Label Correlations Locally , 2012, AAAI.

[24]  Fan Chung Graham,et al.  Some intersection theorems for ordered sets and graphs , 1986, J. Comb. Theory, Ser. A.

[25]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[26]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[27]  Witold Pedrycz,et al.  Selecting Discrete and Continuous Features Based on Neighborhood Decision Error Minimization , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[28]  Miao Xu,et al.  Multi-Label Learning with PRO Loss , 2013, AAAI.

[29]  Amir-Masoud Eftekhari-Moghadam,et al.  Combination of classification and regression in decision tree for multi-labeling image annotation and retrieval , 2013, Appl. Soft Comput..

[30]  Jiye Liang,et al.  Attribute reduction for dynamic data sets , 2013, Appl. Soft Comput..

[31]  O. J. Dunn Multiple Comparisons among Means , 1961 .

[32]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[33]  Ying Yu,et al.  Feature Selection for Multi-label Learning Using Mutual Information and GA , 2014, RSKT.

[34]  Haytham Elghazel,et al.  A Comparison of Multi-Label Feature Selection Methods Using the Random Forest Paradigm , 2014, Canadian Conference on AI.

[35]  Witold Pedrycz,et al.  Granular Computing: Analysis and Design of Intelligent Systems , 2013 .

[36]  Grigorios Tsoumakas,et al.  Label Construction for Multi-label Feature Selection , 2014, 2014 Brazilian Conference on Intelligent Systems.

[37]  Jiawei Han,et al.  Correlated multi-label feature selection , 2011, CIKM '11.

[38]  Philip S. Yu,et al.  Under Consideration for Publication in Knowledge and Information Systems Gmlc: a Multi-label Feature Selection Framework for Graph Classification , 2011 .

[39]  Xindong Wu,et al.  Quality of information-based source assessment and selection , 2014, Neurocomputing.

[40]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[41]  Yiyu Yao,et al.  Interpreting Concept Learning in Cognitive Informatics and Granular Computing , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[42]  Dae-Won Kim,et al.  Memetic feature selection algorithm for multi-label classification , 2015, Inf. Sci..

[43]  Zhi-Hua Zhou,et al.  Multilabel dimensionality reduction via dependence maximization , 2008, TKDD.

[44]  Witold Pedrycz,et al.  Granular Computing: Perspectives and Challenges , 2013, IEEE Transactions on Cybernetics.

[45]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[46]  Keinosuke Fukunaga,et al.  Introduction to statistical pattern recognition (2nd ed.) , 1990 .

[47]  Jesse Read,et al.  A Pruned Problem Transformation Method for Multi-label Classification , 2008 .

[48]  Gavin Brown,et al.  Information Theoretic Feature Selection in Multi-label Data through Composite Likelihood , 2014, S+SSPR.

[49]  Wei-Zhi Wu,et al.  Neighborhood operator systems and approximations , 2002, Inf. Sci..

[50]  Yuhua Qian,et al.  Concept learning via granular computing: A cognitive viewpoint , 2014, Information Sciences.

[51]  C. E. SHANNON,et al.  A mathematical theory of communication , 1948, MOCO.

[52]  Jun Wang,et al.  Enhancing multi-label classification by modeling dependencies among labels , 2014, Pattern Recognit..

[53]  Jinkun Chen,et al.  Feature selection via neighborhood multi-granulation fusion , 2014, Knowl. Based Syst..

[54]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[55]  Qinghua Hu,et al.  Neighborhood rough set based heterogeneous feature subset selection , 2008, Inf. Sci..

[56]  Dae-Won Kim,et al.  Mutual Information-based multi-label feature selection using interaction information , 2015, Expert Syst. Appl..

[57]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[58]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[59]  Grigorios Tsoumakas,et al.  MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[60]  Newton Spolaôr,et al.  Filter Approach Feature Selection Methods to Support Multi-label Learning Based on ReliefF and Information Gain , 2012, SBIA.

[61]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..