Confidence factor and feature selection for semi-supervised multi-label classification methods

In this paper, we investigate two important problems in multi-label classification algorithms, which are: the number of labeled instances and the high dimensionality of the labeled instances. In the literature, we can find several papers about multi-label classification problems, where an instance can be associated with more than one label simultaneously. One of the main issues with multi-label classification methods is that many of these require a high number of instances to be able to generalize in an efficient way. In order to solve this problem, we used semi-supervised learning, which combines labeled and unlabeled instances during the training process. In this sense, the semi-supervised learning may become an essential tool to define, efficiently, the process of automatic assignment of labels. Therefore, this paper presents four semi-supervised methods for the multi-label classification, focusing on the use of a confidence parameter in the process of automatic assignment of labels. In order to validate the feasibility of these methods, an empirical analysis will be conducted using high-dimensional datasets, aiming to evaluate the performance of such methods in different situations. In this case, we will apply a feature selection algorithm in order to reduce, in an efficient way, the number of features to be used by the classification methods.

[1]  L. Reichel,et al.  Krylov-subspace methods for the Sylvester equation , 1992 .

[2]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[3]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[4]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[5]  Emilio Sanchis Arnal,et al.  Multi-label Text Classification Using Multinomial Models , 2004, EsTAL.

[6]  Olivier Chapelle,et al.  A taxonomy of semi-supervised learning algorithms , 2005 .

[7]  A. Nur Zincir-Heywood,et al.  Evaluation of Two Systems on Multi-class Multi-label Document Classification , 2005, ISMIS.

[8]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[9]  Yi Liu,et al.  Semi-supervised Multi-label Learning by Constrained Non-negative Matrix Factorization , 2006, AAAI.

[10]  Piotr Synak,et al.  Multi-Label Classification of Emotions in Music , 2006, Intelligent Information Systems.

[11]  Robert E. Schapire,et al.  Hierarchical multi-label prediction of gene function , 2006, Bioinform..

[12]  Tao Mei,et al.  Correlative multi-label video annotation , 2007, ACM Multimedia.

[13]  Rong Jin,et al.  Semi-supervised Collaborative Text Classification , 2007, ECML.

[14]  Grigorios Tsoumakas,et al.  Multi-Label Classification of Music into Emotions , 2008, ISMIR.

[15]  Robert D. Nowak,et al.  Unlabeled data: Now it helps, now it doesn't , 2008, NIPS.

[16]  Gang Chen,et al.  Semi-supervised Multi-label Learning by Solving a Sylvester Equation , 2008, SDM.

[17]  Grigorios Tsoumakas,et al.  Effective and Efficient Multilabel Classification in Domains with Large Number of Labels , 2008 .

[18]  Geoff Holmes,et al.  Classifier Chains for Multi-label Classification , 2009, ECML/PKDD.

[19]  Renato R. O. da Silva,et al.  Comparing Methods for Multilabel Classification of Proteins Using Machine Learning Techniques , 2009, BSB.

[20]  Grigorios Tsoumakas,et al.  Mining Multi-label Data , 2010, Data Mining and Knowledge Discovery Handbook.

[21]  Technical N Ote Algorithms for subsetting attribute values with Relief , 2010 .

[22]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[23]  Grigorios Tsoumakas,et al.  Random K-labelsets for Multilabel Classification , 2022 .

[24]  A. M. Santos,et al.  Using semi-supervised learning in multi-label classification problems , 2012, The 2012 International Joint Conference on Neural Networks (IJCNN).

[25]  Newton Spolaôr,et al.  A Comparison of Multi-label Feature Selection Methods using the Problem Transformation Approach , 2013, CLEI Selected Papers.

[26]  Anne M. P. Canuto,et al.  Using confidence values in multi-label classification problems with semi-supervised learning , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).