Multi-label learning with incomplete class assignments

We consider a special type of multi-label learning where class assignments of training examples are incomplete. As an example, an instance whose true class assignment is (c1, c2, c3) is only assigned to class c1 when it is used as a training sample. We refer to this problem as multi-label learning with incomplete class assignment. Incompletely labeled data is frequently encountered when the number of classes is very large (hundreds as in MIR Flickr dataset) or when there is a large ambiguity between classes (e.g., jet vs plane). In both cases, it is difficult for users to provide complete class assignments for objects. We propose a ranking based multi-label learning framework that explicitly addresses the challenge of learning from incompletely labeled data by exploiting the group lasso technique to combine the ranking errors. We present a learning algorithm that is empirically shown to be efficient for solving the related optimization problem. Our empirical study shows that the proposed framework is more effective than the state-of-the-art algorithms for multi-label learning in dealing with incompletely labeled data.

[1]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[2]  Andrew McCallum,et al.  Collective multi-label classification , 2005, CIKM '05.

[3]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[4]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[5]  Rong Jin,et al.  Efficient multi-label ranking for multi-class learning: Application to object recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Yoram Singer,et al.  Efficient Learning of Label Ranking by Soft Projections onto Polyhedra , 2006, J. Mach. Learn. Res..

[7]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[8]  Cristian Sminchisescu,et al.  Object recognition as ranking holistic figure-ground hypotheses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[10]  Rich Caruana,et al.  Classification with partial labels , 2008, KDD.

[11]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[12]  Richard S. Zemel,et al.  Learning Hybrid Models for Image Annotation with Partially Labeled Data , 2008, NIPS.

[13]  Naonori Ueda,et al.  Parametric Mixture Models for Multi-Labeled Text , 2002, NIPS.

[14]  Cordelia Schmid,et al.  Semantic Hierarchies for Visual Object Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[16]  Jianping Fan,et al.  Harvesting large-scale weakly-tagged image databases from the web , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[17]  Cordelia Schmid,et al.  Multimodal semi-supervised learning for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Koby Crammer,et al.  Online Passive-Aggressive Algorithms , 2003, J. Mach. Learn. Res..

[19]  Jieping Ye,et al.  Extracting shared subspace for multi-label classification , 2008, KDD.

[20]  Dan Roth,et al.  Constraint Classification for Multiclass Classification and Ranking , 2002, NIPS.

[21]  Ambuj Tewari,et al.  Efficient bandit algorithms for online multiclass prediction , 2008, ICML '08.

[22]  Rong Jin,et al.  Learning with Multiple Labels , 2002, NIPS.

[23]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[24]  Donald Goldfarb,et al.  Second-order cone programming , 2003, Math. Program..

[25]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[26]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[27]  Gang Chen,et al.  Semi-supervised Multi-label Learning by Solving a Sylvester Equation , 2008, SDM.

[28]  Mikhail Petrovskiy,et al.  Paired Comparisons Method for Solving Multi-Label Learning Problem , 2006, 2006 Sixth International Conference on Hybrid Intelligent Systems (HIS'06).

[29]  Yoram Singer,et al.  Log-Linear Models for Label Ranking , 2003, NIPS.

[30]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[31]  Rong Jin,et al.  A Potential-based Framework for Online Multi-class Learning with Partial Feedback , 2010, AISTATS.

[32]  Manik Varma,et al.  Learning The Discriminative Power-Invariance Trade-Off , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[33]  Alex Pentland,et al.  Expectation Maximization for Weakly Labeled Data , 2001, ICML.

[34]  Yi Liu,et al.  Semi-supervised Multi-label Learning by Constrained Non-negative Matrix Factorization , 2006, AAAI.