A Comparison of Multi-label Feature Selection Methods using the Problem Transformation Approach

Feature selection is an important task in machine learning, which can effectively reduce the dataset dimensionality by removing irrelevant and/or redundant features. Although a large body of research deals with feature selection in single-label data, in which measures have been proposed to filter out irrelevant features, this is not the case for multi-label data. This work proposes multi-label feature selection methods which use the filter approach. To this end, two standard multi-label feature selection approaches, which transform the multi-label data into single-label data, are used. Besides these two problem transformation approaches, we use ReliefF and Information Gain to measure the goodness of features. This gives rise to four multi-label feature selection methods. A thorough experimental evaluation of these methods was carried out on 10 benchmark datasets. Results show that ReliefF is able to select fewer features without diminishing the quality of the classifiers constructed using the features selected.

[1]  Grigorios Tsoumakas,et al.  An Empirical Study of Lazy Multilabel Classification Algorithms , 2008, SETN.

[2]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[3]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[4]  Andrea Esuli,et al.  Boosting multi-label hierarchical text categorization , 2008, Information Retrieval.

[5]  Volker Roth,et al.  Kernel methods for regression and classification , 2001 .

[6]  Yong Wang,et al.  Semi-supervised Multi-label Learning Algorithm Using Dependency Among Labels , 2011 .

[7]  Alexander K. Petrenko,et al.  Electronic Notes in Theoretical Computer Science , 2009 .

[8]  Ian Witten,et al.  Data Mining , 2000 .

[9]  Víctor Robles,et al.  Feature selection for multi-label naive Bayes classification , 2009, Inf. Sci..

[10]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[11]  José Ramón Quevedo,et al.  Graphical Feature Selection for Multilabel Classification Tasks , 2011, IDA.

[12]  Pabitra Mitra,et al.  Multi-label Text Classification Approach for Sentence Level News Emotion Analysis , 2009, PReMI.

[13]  Newton Spolaôr,et al.  Filter Approach Feature Selection Methods to Support Multi-label Learning Based on ReliefF and Information Gain , 2012, SBIA.

[14]  Grigorios Tsoumakas,et al.  MULAN: A Java Library for Multi-Label Learning , 2011, J. Mach. Learn. Res..

[15]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[16]  Yiqin Wang,et al.  Symptom selection for multi-label data of inquiry diagnosis in traditional Chinese medicine , 2013, Science China Information Sciences.

[17]  Marcel Worring,et al.  The Mediamill Semantic Video Search Engine , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[18]  Newton Spolaôr,et al.  ReliefF for Multi-label Feature Selection , 2013, 2013 Brazilian Conference on Intelligent Systems.

[19]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[20]  M. C. Monard,et al.  A systematic review to identify feature selection publications in multi-labeled data , 2012 .

[21]  Everton Alvares Cherman,et al.  Incorporating label dependency into the binary relevance framework for multi-label classification , 2012, Expert Syst. Appl..

[22]  Huan Liu,et al.  Advancing Feature Selection Research − ASU Feature Selection Repository , 2010 .

[23]  Technical N Ote Algorithms for subsetting attribute values with Relief , 2010 .

[24]  Peerapon Vateekul,et al.  Irrelevant attributes and imbalanced classes in multi-label text-categorization domains , 2011, Intell. Data Anal..

[25]  Jiawei Han,et al.  Correlated multi-label feature selection , 2011, CIKM '11.

[26]  Qiang Yang,et al.  Document Transformation for Multi-label Feature Selection in Text Categorization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[27]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[28]  Jesse Read,et al.  A Pruned Problem Transformation Method for Multi-label Classification , 2008 .

[29]  Marko Robnik-Sikonja,et al.  Theoretical and Empirical Analysis of ReliefF and RReliefF , 2003, Machine Learning.

[30]  Michel Verleysen,et al.  Feature Selection for Multi-label Classification Problems , 2011, IWANN.

[31]  Hiroshi Motoda,et al.  Computational Methods of Feature Selection , 2022 .

[32]  Dominik Slezak,et al.  Attribute Reduction in the Bayesian Version of Variable Precision Rough Set Model , 2003, RSKD.

[33]  Huan Liu,et al.  Toward integrating feature selection algorithms for classification and clustering , 2005, IEEE Transactions on Knowledge and Data Engineering.