Label Construction for Multi-label Feature Selection

Multi-label learning handles datasets where each instance is associated with multiple labels, which are often correlated. As other machine learning tasks, multi-label learning also suffers from the curse of dimensionality, which can be mitigated by dimensionality reduction tasks, such as feature selection. The standard approach for multi-label feature selection transforms the multi-label dataset into single-label datasets before using traditional feature selection algorithms. However, this approach often ignores label dependence. This work proposes an alternative method, LCFS, which constructs new labels based on relations between the original labels to augment the label set of the original dataset. Afterwards, the augmented dataset is submitted to the standard multi-label feature selection approach. Experiments using Information Gain as a measure to evaluate features were carried out in 10 multi-label benchmark datasets. For each dataset, the quality of the features selected was assessed by the quality of the classifiers built using the features selected by the standard approach in the original dataset, as well as in the dataset constructed by four LCFS settings. The results show that setting LCFS with simple strategies using pairs of labels gives rise to better classifiers than the ones built using the standard approach in the original dataset. Moreover, these good results are accomplished when a small number of features are selected.

[1]  Grigorios Tsoumakas,et al.  Evaluating Feature Selection Methods for Multi-Label Text Classication , 2013, BioASQ@CLEF.

[2]  M. C. Monard,et al.  A systematic review to identify feature selection publications in multi-labeled data , 2012 .

[3]  Fabrício Olivetti de França,et al.  Extending features for multilabel classification with swarm biclustering , 2013, 2013 IEEE Congress on Evolutionary Computation.

[4]  Marcos Aurélio Domingues,et al.  Three Current Issues In Music Autotagging , 2011, ISMIR.

[5]  Alex Alves Freitas,et al.  Two Extensions to Multi-label Correlation-Based Feature Selection: A Case Study in Bioinformatics , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[6]  Everton Alvares Cherman,et al.  On the Estimation of Predictive Evaluation Measure Baselines for Multi-label Learning , 2012, IBERAMIA.

[7]  Grigorios Tsoumakas,et al.  An Empirical Study of Lazy Multilabel Classification Algorithms , 2008, SETN.

[8]  Newton Spolaôr,et al.  ReliefF for Multi-label Feature Selection , 2013, 2013 Brazilian Conference on Intelligent Systems.

[9]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.

[10]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[11]  Dit-Yan Yeung,et al.  Multilabel relationship learning , 2013, TKDD.

[12]  Dah-Jye Lee,et al.  A feature construction method for general object recognition , 2013, Pattern Recognit..

[13]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[14]  Hiroshi Motoda,et al.  Book Review: Computational Methods of Feature Selection , 2007, The IEEE intelligent informatics bulletin.

[15]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[16]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[17]  Rafael Geraldeli Rossi,et al.  Building a topic hierarchy using the bag-of-related-words representation , 2011, DocEng '11.