Large-scale Multi-label Learning with Missing Labels

The multi-label classification problem has generated significant interest in recent years. However, existing approaches do not adequately address two key challenges: (a) scaling up to problems with a large number (say millions) of labels, and (b) handling data with missing labels. In this paper, we directly address both these problems by studying the multi-label problem in a generic empirical risk minimization (ERM) framework. Our framework, despite being simple, is surprisingly able to encompass several recent label-compression based methods which can be derived as special cases of our method. To optimize the ERM problem, we develop techniques that exploit the structure of specific loss functions - such as the squared loss function - to obtain efficient algorithms. We further show that our learning framework admits excess risk bounds even in the presence of missing labels. Our bounds are tight and demonstrate better generalization performance for low-rank promoting trace-norm regularization when compared to (rank insensitive) Frobenius norm regularization. Finally, we present extensive empirical results on a variety of benchmark datasets and show that our methods perform significantly better than existing label compression based methods and can scale up to very large datasets such as a Wikipedia dataset that has more than 200,000 labels.

[1]  A. Izenman Reduced-rank regression for the multivariate linear model , 1975 .

[2]  M. Talagrand,et al.  Probability in Banach Spaces: Isoperimetry and Processes , 1991 .

[3]  J. Friedman,et al.  Predicting Multivariate Responses in Multiple Linear Regression , 1997 .

[4]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[5]  Chih-Jen Lin,et al.  Trust region Newton methods for large-scale logistic regression , 2007, ICML '07.

[6]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[8]  Ambuj Tewari,et al.  On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization , 2008, NIPS.

[9]  Chih-Jen Lin,et al.  Trust Region Newton Method for Logistic Regression , 2008, J. Mach. Learn. Res..

[10]  Shuicheng Yan,et al.  Multi-label sparse coding for automatic image annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Rong Jin,et al.  Efficient multi-label ranking for multi-class learning: Application to object recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  John Langford,et al.  Multi-Label Prediction via Compressed Sensing , 2009, NIPS.

[13]  Lihi Zelnik-Manor,et al.  Large Scale Max-Margin Multi-Label Classification with Priors , 2010, ICML.

[14]  Jason Weston,et al.  Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[15]  Jieping Ye,et al.  Canonical Correlation Analysis for Multilabel Classification: A Least-Squares Formulation, Extensions, and Analysis , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Rong Jin,et al.  Multi-label learning with incomplete class assignments , 2011, CVPR 2011.

[17]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[18]  Ohad Shamir,et al.  Collaborative Filtering with the Trace Norm: Learning, Bounding, and Transducing , 2011, COLT.

[19]  Ashish Kapoor,et al.  Multilabel Classification using Bayesian Compressed Sensing , 2012, NIPS.

[20]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[21]  Ambuj Tewari,et al.  Regularization Techniques for Learning with Matrices , 2009, J. Mach. Learn. Res..

[22]  Hsuan-Tien Lin,et al.  Multilabel Classification with Principal Label Space Transformation , 2012, Neural Computation.

[23]  Hsuan-Tien Lin,et al.  Feature-aware Label Space Dimension Reduction for Multi-label Classification , 2012, NIPS.

[24]  Manik Varma,et al.  Multi-label learning with millions of labels: recommending advertiser bid phrases for web pages , 2013, WWW.

[25]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .