Large-Scale Multi-Label Learning with Incomplete Label Assignments

Multi-label learning deals with the classification problems where each instance can be assigned with multiple labels simultaneously. Conventional multi-label learning approaches mainly focus on exploiting label correlations. It is usually assumed, explicitly or implicitly, that the label sets for training instances are fully labeled without any missing labels. However, in many real-world multi-label datasets, the label assignments for training instances can be incomplete. Some groundtruth labels can be missed by the labeler from the label set. This problem is especially typical when the number instances is very large, and the labeling cost is very high, which makes it almost impossible to get a fully labeled training set. In this paper, we study the problem of large-scale multi-label learning with incomplete label assignments. We propose an approach, called Mpu, based upon positive and unlabeled stochastic gradient descent and stacked models. Unlike prior works, our method can effectively and efficiently consider missing labels and label correlations simultaneously, and is very scalable, that has linear time complexities over the size of the data. Extensive experiments on two real-world multi-label datasets show that our Mpu model consistently outperform other commonly-used baselines.

[1]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[2]  Yi Liu,et al.  Semi-supervised Multi-label Learning by Constrained Non-negative Matrix Factorization , 2006, AAAI.

[3]  Grigorios Tsoumakas,et al.  Random k -Labelsets: An Ensemble Method for Multilabel Classification , 2007, ECML.

[4]  Zhi-Hua Zhou,et al.  Multi-Label Learning with Weak Label , 2010, AAAI.

[5]  William W. Cohen,et al.  Stacked Graphical Models for Efficient Inference in Markov Random Fields , 2007, SDM.

[6]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[7]  Geoff Holmes,et al.  Multi-label Classification Using Ensembles of Pruned Sets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[8]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[9]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[10]  Eyke Hüllermeier,et al.  Bayes Optimal Multilabel Classification via Probabilistic Classifier Chains , 2010, ICML.

[11]  Rong Jin,et al.  Correlated Label Propagation with Application to Multi-label Learning , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12]  Kun Zhang,et al.  Multi-label learning by exploiting label dependency , 2010, KDD.

[13]  Naonori Ueda,et al.  Parametric Mixture Models for Multi-Labeled Text , 2002, NIPS.

[14]  Dale Schuurmans,et al.  Adaptive Large Margin Training for Multilabel Classification , 2011, AAAI.

[15]  Ming Yang,et al.  Mining partially annotated images , 2011, KDD.

[16]  Eyke Hüllermeier,et al.  Combining Instance-Based Learning and Logistic Regression for Multilabel Classification , 2009, ECML/PKDD.

[17]  Philip S. Yu,et al.  Building text classifiers using positive and unlabeled examples , 2003, Third IEEE International Conference on Data Mining.

[18]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[19]  Andrew McCallum,et al.  Collective multi-label classification , 2005, CIKM '05.

[20]  Lihi Zelnik-Manor,et al.  Large Scale Max-Margin Multi-Label Classification with Priors , 2010, ICML.

[21]  Eisaku Maeda,et al.  Maximal Margin Labeling for Multi-Topic Text Categorization , 2004, NIPS.

[22]  Charles Elkan,et al.  Learning classifiers from only positive and unlabeled data , 2008, KDD.

[23]  Sunita Sarawagi,et al.  Discriminative Methods for Multi-labeled Classification , 2004, PAKDD.

[24]  David D. Jensen,et al.  Why Stacked Models Perform Effective Collective Classification , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[25]  Rong Jin,et al.  Multi-label learning with incomplete class assignments , 2011, CVPR 2011.

[26]  Zhi-Hua Zhou,et al.  Multi-Label Learning by Instance Differentiation , 2007, AAAI.