Efficient max-margin multi-label classification with applications to zero-shot learning

The goal in multi-label classification is to tag a data point with the subset of relevant labels from a pre-specified set. Given a set of L labels, a data point can be tagged with any of the 2L possible subsets. The main challenge therefore lies in optimising over this exponentially large label space subject to label correlations.Our objective, in this paper, is to design efficient algorithms for multi-label classification when the labels are densely correlated. In particular, we are interested in the zero-shot learning scenario where the label correlations on the training set might be significantly different from those on the test set.We propose a max-margin formulation where we model prior label correlations but do not incorporate pairwise label interaction terms in the prediction function. We show that the problem complexity can be reduced from exponential to linear while modelling dense pairwise prior label correlations. By incorporating relevant correlation priors we can handle mismatches between the training and test set statistics. Our proposed formulation generalises the effective 1-vs-All method and we provide a principled interpretation of the 1-vs-All technique.We develop efficient optimisation algorithms for our proposed formulation. We adapt the Sequential Minimal Optimisation (SMO) algorithm to multi-label classification and show that, with some book-keeping, we can reduce the training time from being super-quadratic to almost linear in the number of labels. Furthermore, by effectively re-utilizing the kernel cache and jointly optimising over all variables, we can be orders of magnitude faster than the competing state-of-the-art algorithms. We also design a specialised algorithm for linear kernels based on dual co-ordinate ascent with shrinkage that lets us effortlessly train on a million points with a hundred labels.

[1]  Steve McLaughlin,et al.  Comparative study of textural analysis techniques to characterise tissue from intravascular ultrasound , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[2]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[3]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[4]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[5]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[6]  Naonori Ueda,et al.  Parametric Mixture Models for Multi-Labeled Text , 2002, NIPS.

[7]  Ben Taskar,et al.  Max-Margin Markov Networks , 2003, NIPS.

[8]  Koby Crammer,et al.  A Family of Additive Online Algorithms for Category Ranking , 2003, J. Mach. Learn. Res..

[9]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[10]  Claudio Gentile,et al.  Incremental Algorithms for Hierarchical Classification , 2004, J. Mach. Learn. Res..

[11]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[12]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[13]  Lei Wang,et al.  Multilabel SVM active learning for image classification , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[14]  S. Sathiya Keerthi,et al.  Convergence of a Generalized SMO Algorithm for SVM Classifier Design , 2002, Machine Learning.

[15]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[16]  Charles A. Micchelli,et al.  Learning Multiple Tasks with Kernel Methods , 2005, J. Mach. Learn. Res..

[17]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[18]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..

[19]  Juho Rousu,et al.  Kernel-Based Learning of Hierarchical Multilabel Classification Models , 2006, J. Mach. Learn. Res..

[20]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[21]  Thomas Hofmann,et al.  Exploiting Known Taxonomies in Learning Overlapping Concepts , 2007, IJCAI.

[22]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[23]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[24]  P. König Faculty Opinions recommendation of Predicting human brain activity associated with the meanings of nouns. , 2008 .

[25]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[26]  Tom Michael Mitchell,et al.  Predicting Human Brain Activity Associated with the Meanings of Nouns , 2008, Science.

[27]  Jieping Ye,et al.  Multi-label Multiple Kernel Learning , 2008, NIPS.

[28]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[30]  S. Lucidi,et al.  Decomposition Algorithm Model for Singly Linearly-Constrained Problems Subject to Lower and Upper Bounds , 2009 .

[31]  Víctor Robles,et al.  Feature selection for multi-label naive Bayes classification , 2009, Inf. Sci..

[32]  Min-Ling Zhang,et al.  MIMLRBF: RBF neural networks for multi-instance multi-label learning , 2009, Neurocomputing.

[33]  Umberto Castellani,et al.  Multiple kernel learning , 2009 .

[34]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  John Langford,et al.  Multi-Label Prediction via Compressed Sensing , 2009, NIPS.

[36]  Lihi Zelnik-Manor,et al.  Large Scale Max-Margin Multi-Label Classification with Priors , 2010, ICML.

[37]  S. V. N. Vishwanathan,et al.  Multiple Kernel Learning and the SMO Algorithm , 2010, NIPS.

[38]  Gang Wang,et al.  Comparative object similarity for improved recognition with few or no examples , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[39]  Ali Farhadi,et al.  Attribute-centric recognition for cross-category generalization , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[40]  Eyke Hüllermeier,et al.  Graded Multilabel Classification: The Ordinal Case , 2010, ICML.

[41]  Ohad Shamir,et al.  Multiclass-Multilabel Classification with More Classes than Examples , 2010, AISTATS.

[42]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[43]  I. Song,et al.  Working Set Selection Using Second Order Information for Training Svm, " Complexity-reduced Scheme for Feature Extraction with Linear Discriminant Analysis , 2022 .