Online multi-label learning with accelerated nonsmooth stochastic gradient descent

Multi-label learning refers to methods for learning a set of functions that assigns a set of relevant labels to each instance. One of popular approaches to multi-label learning is label ranking, where a set of ranking functions are learned to order all the labels such that relevant labels are ranked higher than irrelevant ones. Rank-SVM is a representative method for label ranking where ranking loss is minimized in the framework of max margin. However, the dual form in Rank-SVM involves a quadratic programming which is generally solved in cubic time in the size of training data. The primal form is appealing for the development of online learning but involves a nonsmooth convex loss function. In this paper we present a method for online multi-label learning where we minimize the primal form using the accelerated nonsmooth stochastic gradient descent which has been recently developed to extend Nesterov's smoothing method to the stochastic setting. Numerical experiments on several large-scale datasets demonstrate the computational efficiency and fast convergence of our proposed method, compared to existing methods including subgradient-based algorithms.

[1]  Yoram Singer,et al.  BoosTexter: A Boosting-based System for Text Categorization , 2000, Machine Learning.

[2]  Dan Roth,et al.  Constraint Classification for Multiclass Classification and Ranking , 2002, NIPS.

[3]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[4]  Koby Crammer,et al.  A Family of Additive Online Algorithms for Category Ranking , 2003, J. Mach. Learn. Res..

[5]  Geoff Holmes,et al.  Classifier Chains for Multi-label Classification , 2009, ECML/PKDD.

[6]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[7]  Jason Weston,et al.  Kernel methods for Multi-labelled classification and Categ orical regression problems , 2001, NIPS 2001.

[8]  Yoram Singer,et al.  Log-Linear Models for Label Ranking , 2003, NIPS.

[9]  Xinhua Zhang,et al.  Smoothing multivariate performance measures , 2011, J. Mach. Learn. Res..

[10]  Alexander G. Gray,et al.  Stochastic Smoothing for Nonsmooth Minimizations: Accelerating SGD by Exploiting Structure , 2012, ICML.

[11]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[12]  Léon Bottou,et al.  Stochastic Learning , 2003, Advanced Lectures on Machine Learning.

[13]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[14]  Jieping Ye,et al.  Extracting shared subspace for multi-label classification , 2008, KDD.

[15]  Kun Zhang,et al.  Multi-label learning by exploiting label dependency , 2010, KDD.

[16]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[17]  Stephen P. Boyd,et al.  Stochastic Subgradient Methods , 2007 .

[18]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[19]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[20]  Charles X. Ling,et al.  AUC: A Better Measure than Accuracy in Comparing Learning Algorithms , 2003, Canadian Conference on AI.

[21]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[22]  J. Nocedal Updating Quasi-Newton Matrices With Limited Storage , 1980 .