Multi-label Feature Selection via Information Gain

Multi-label classification has gained extensive attention recently. Compared with traditional classification, multi-label classification allows one instance to associate with multiple labels. The curse of dimensionality existing in multi-label data presents a challenge to the performance of multi-label classifiers. Multi-label feature selection is a powerful tool for high-dimension problem. However, the existing feature selection methods are unable to take both computational complexity and label correlation into consideration. To address this problem, a new approach based on information gain for multi-label feather selection (IGMF) is presented in this paper. In the process of IGMF, Information gain between a feature and label set is exploited to measure the importance of the feature and label corrections. After that, the optimal feature subset are obtained by setting the threshold value. A series of experimental results show that IGMF can promote performance of multi-label classifiers.

[1]  A.N. Srivastava,et al.  Discovering recurring anomalies in text reports regarding complex space systems , 2005, 2005 IEEE Aerospace Conference.

[2]  Grigorios Tsoumakas,et al.  An Empirical Study of Lazy Multilabel Classification Algorithms , 2008, SETN.

[3]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[4]  Dae-Won Kim,et al.  Feature selection for multi-label classification using multivariate mutual information , 2013, Pattern Recognit. Lett..

[5]  Josef Kittler,et al.  Floating search methods in feature selection , 1994, Pattern Recognit. Lett..

[6]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[7]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.

[8]  Jianmin Zhao,et al.  A multi-label classification algorithm based on label-specific features , 2011, Wuhan University Journal of Natural Sciences.

[9]  Grigorios Tsoumakas,et al.  Correlation-Based Pruning of Stacked Binary Relevance Models for Multi-Label Learning , 2009 .

[10]  George A. Vouros,et al.  Artificial Intelligence: Theories, Models and Applications, 5th Hellenic Conference on AI, SETN 2008, Syros, Greece, October 2-4, 2008. Proceedings , 2008, SETN.

[11]  Huan Liu,et al.  Feature Selection: An Ever Evolving Frontier in Data Mining , 2010, FSDM.

[12]  Chris H. Q. Ding,et al.  Multi-label ReliefF and F-statistic feature selections for image annotation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Gavin Brown,et al.  A New Perspective for Information Theoretic Feature Selection , 2009, AISTATS.

[14]  Zhi-Hua Zhou,et al.  Multilabel dimensionality reduction via dependence maximization , 2008, TKDD.

[15]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[16]  Jacob Zahavi,et al.  Using simulated annealing to optimize the feature selection problem in marketing applications , 2006, Eur. J. Oper. Res..

[17]  Víctor Robles,et al.  Feature selection for multi-label naive Bayes classification , 2009, Inf. Sci..

[18]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[19]  I. Jolliffe Principal Component Analysis , 2002 .

[20]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[21]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[22]  Lior Rokach,et al.  Data Mining and Knowledge Discovery Handbook, 2nd ed , 2010, Data Mining and Knowledge Discovery Handbook, 2nd ed..

[23]  Yiqin Wang,et al.  Symptom selection for multi-label data of inquiry diagnosis in traditional Chinese medicine , 2013, Science China Information Sciences.

[24]  Chen Jie-xin Feature selection for multi-label data by using simulated annealing , 2011 .

[25]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[26]  Eyke Hüllermeier,et al.  Combining Instance-Based Learning and Logistic Regression for Multilabel Classification , 2009, ECML/PKDD.

[27]  Yan Chen,et al.  Embedded Feature Selection for Multi-label Classification of Music Emotions , 2012, Int. J. Comput. Intell. Syst..

[28]  Gert R. G. Lanckriet,et al.  Semantic Annotation and Retrieval of Music and Sound Effects , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[29]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.