Multi-label semi-supervised classification through optimum-path forest

Abstract Multi-label classification consists of assigning one or multiple classes to each sample in a given dataset. However, the project of a multi-label classifier is usually limited to a small number of supervised samples as compared to the number of all possible label combinations. This scenario favors semi-supervised learning methods, which can cope with the absence of supervised samples by adding unsupervised ones to the training set. Recently, we proposed a semi-supervised learning method based on optimum connectivity for single-label classification. In this work, we extend it for multi-label classification with considerable effectiveness gain. After a single-label data transformation, the method propagates labels from supervised to unsupervised samples, as in the original approach, by assuming that samples from the same class are more closely connected through sequences of nearby samples than samples from distinct classes. Given that the procedure is more reliable in high-density regions of the feature space, an additional step repropagates labels from the maxima of a probability density function to correct possible labeling errors from the previous step. Finally, the data transformation is reversed to obtain multiple labels per sample. The new approach is experimentally validated on several datasets in comparison with state-of-the-art methods.

[1]  Michael K. Ng,et al.  ML-FOREST: A Multi-Label Tree Ensemble Method for Multi-Label Classification , 2016, IEEE Transactions on Knowledge and Data Engineering.

[2]  João Paulo Papa,et al.  Optimum-Path Forest based on k-connectivity: Theory and applications , 2017, Pattern Recognit. Lett..

[3]  Rong Jin,et al.  Correlated Label Propagation with Application to Multi-label Learning , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[5]  Yi Yang,et al.  A Convex Formulation for Semi-Supervised Multi-Label Feature Selection , 2014, AAAI.

[6]  Alexandre X. Falcão,et al.  Motion segmentation and activity representation in crowds , 2009 .

[7]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[8]  Robert D. Nowak,et al.  Multi-Manifold Semi-Supervised Learning , 2009, AISTATS.

[9]  Dit-Yan Yeung,et al.  Semi-Supervised Discriminant Analysis using robust path-based similarity , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  João Paulo Papa,et al.  Efficient supervised optimum-path forest classification for large datasets , 2012, Pattern Recognit..

[11]  Eugene Charniak,et al.  Effective Self-Training for Parsing , 2006, NAACL.

[12]  Jayaram K. Udupa,et al.  Brain tissue MR-image segmentation via optimum-path forest clustering , 2012, Comput. Vis. Image Underst..

[13]  Alexandre X. Falcão,et al.  Semi-supervised Pattern Classification Using Optimum-Path Forest , 2014, 2014 27th SIBGRAPI Conference on Graphics, Patterns and Images.

[14]  João Paulo Papa,et al.  Supervised pattern classification based on optimum-path forest , 2009 .

[15]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[16]  Jason Weston,et al.  Large Scale Transductive SVMs , 2006, J. Mach. Learn. Res..

[17]  Saso Dzeroski,et al.  An extensive experimental comparison of methods for multi-label learning , 2012, Pattern Recognit..

[18]  M. Friedman A Comparison of Alternative Tests of Significance for the Problem of $m$ Rankings , 1940 .

[19]  Martial Hebert,et al.  Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[20]  Clayton R. Pereira,et al.  A nature-inspired approach to speed up optimum-path forest clustering and its application to intrusion detection in computer networks , 2015, Inf. Sci..

[21]  Alexandros Iosifidis,et al.  Regularized extreme learning machine for multi-view semi-supervised action recognition , 2014, Neurocomputing.

[22]  Jurandy Almeida,et al.  A Multiple Labeling-Based Optimum-Path Forest for Video Content Classification , 2013, 2013 XXVI Conference on Graphics, Patterns and Images.

[23]  Dit-Yan Yeung,et al.  Robust path-based spectral clustering , 2008, Pattern Recognit..

[24]  Jorge Stolfi,et al.  The image foresting transform: theory, algorithms, and applications , 2004 .

[25]  M. Borodovsky,et al.  Gene identification in novel eukaryotic genomes by self-training algorithm , 2005, Nucleic acids research.

[26]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[27]  João Paulo Papa,et al.  A New Variant of the Optimum-Path Forest Classifier , 2008, ISVC.

[28]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[29]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[30]  Thorsten Joachims,et al.  Transductive Inference for Text Classification using Support Vector Machines , 1999, ICML.

[31]  João Paulo Papa,et al.  Improving semi-supervised learning through optimum connectivity , 2016, Pattern Recognit..

[32]  Hemerson Pistori,et al.  Attributes Reduction Applied to Leather Defects Classification , 2010, 2010 23rd SIBGRAPI Conference on Graphics, Patterns and Images.

[33]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[34]  Xiaojun Chang,et al.  Semisupervised Feature Analysis by Mining Correlations Among Multiple Tasks , 2014, IEEE Transactions on Neural Networks and Learning Systems.