Information Gain Feature Selection for Multi-Label Classification

In many important application domains, such as text categorization, biomolecular analysis, scene or video classification and medical diagnosis, instances are naturally associated with more than one class label, giving rise to multi-label classification problems. This fact has led, in recent years, to a substantial amount of research in multi-label classification. And, more specifically, many feature selection methods have been developed to allow the identification of relevant and informative features for multi-label classification. However, most methods proposed for this task rely on the transformation of the multi-label data set into a single-label one. Besides, there is no single work that carries out a comprehensive evaluation of the various multi-label classification techniques coupled with feature selection methods over data sets from different domains. In this work, we perform these experimental evaluations, and also propose an adaptation of the information gain feature selection technique to handle multi-label data directly.

[1]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[2]  Lei Tang,et al.  Large scale multi-label classification via metalabeler , 2009, WWW '09.

[3]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[4]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[5]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[6]  Grigorios Tsoumakas,et al.  Effective and Efficient Multilabel Classification in Domains with Large Number of Labels , 2008 .

[7]  Eyke Hüllermeier,et al.  Combining Instance-Based Learning and Logistic Regression for Multilabel Classification , 2009, ECML/PKDD.

[8]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[9]  Huan Liu,et al.  Feature Selection for Classification , 1997, Intell. Data Anal..

[10]  Qiang Yang,et al.  Document Transformation for Multi-label Feature Selection in Text Categorization , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[11]  Mohammad S. Sorower A Literature Survey on Algorithms for Multi-label Learning , 2010 .

[12]  Michel Verleysen,et al.  Feature Selection for Multi-label Classification Problems , 2011, IWANN.

[13]  Masoud Nikravesh,et al.  Feature Extraction - Foundations and Applications , 2006, Feature Extraction.

[14]  Eyke Hüllermeier,et al.  On label dependence and loss minimization in multi-label classification , 2012, Machine Learning.

[15]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[16]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[17]  Newton Spolaôr,et al.  A Comparison of Multi-label Feature Selection Methods using the Problem Transformation Approach , 2013, CLEI Selected Papers.

[18]  Sebastián Ventura,et al.  ReliefF-ML: An Extension of ReliefF Algorithm to Multi-label Learning , 2013, CIARP.

[19]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[20]  Alex Alves Freitas,et al.  A Genetic Algorithm for Optimizing the Label Ordering in Multi-label Classifier Chains , 2013, 2013 IEEE 25th International Conference on Tools with Artificial Intelligence.

[21]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[22]  Alex Alves Freitas,et al.  Two Extensions to Multi-label Correlation-Based Feature Selection: A Case Study in Bioinformatics , 2013, 2013 IEEE International Conference on Systems, Man, and Cybernetics.

[23]  Newton Spolaôr,et al.  ReliefF for Multi-label Feature Selection , 2013, 2013 Brazilian Conference on Intelligent Systems.

[24]  Douglas W. Oard,et al.  Combining feature selectors for text classification , 2006, CIKM '06.

[25]  Víctor Robles,et al.  Feature selection for multi-label naive Bayes classification , 2009, Inf. Sci..

[26]  Rohini K. Srihari,et al.  Feature selection for text categorization on imbalanced data , 2004, SKDD.

[27]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[28]  Jesse Read,et al.  A Pruned Problem Transformation Method for Multi-label Classification , 2008 .

[29]  Dae-Won Kim,et al.  Feature selection for multi-label classification using multivariate mutual information , 2013, Pattern Recognit. Lett..

[30]  Alex Alves Freitas,et al.  Distinct Chains for Different Instances: An Effective Strategy for Multi-label Classifier Chains , 2014, ECML/PKDD.