Top-k Multiclass SVM

Class ambiguity is typical in image classification problems with a large number of classes. When classes are difficult to discriminate, it makes sense to allow k guesses and evaluate classifiers based on the top-k error instead of the standard zero-one loss. We propose top-k multiclass SVM as a direct method to optimize for top-k performance. Our generalization of the well-known multiclass SVM is based on a tight convex upper bound of the top-k error. We propose a fast optimization scheme based on an efficient projection onto the top-k simplex, which is of its own interest. Experiments on five datasets show consistent improvements in top-k accuracy compared to various baselines.

[1]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[2]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[3]  Wlodzimierz Ogryczak,et al.  Minimizing the sum of the k largest functions in linear time , 2003, Inf. Process. Lett..

[4]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[5]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[6]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[7]  Jason Weston,et al.  Solving multiclass support vector machines with LaRank , 2007, ICML '07.

[8]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[9]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[10]  K. Kiwiel Variable Fixing Algorithms for the Continuous Quadratic Knapsack Problem , 2008 .

[11]  Michael Patriksson,et al.  A survey on the continuous nonlinear resource allocation problem , 2008, Eur. J. Oper. Res..

[12]  Jun Liu,et al.  Efficient Euclidean projections in linear time , 2009, ICML '09.

[13]  Patrick Gallinari,et al.  Ranking with ordered weighted pairwise classification , 2009, ICML '09.

[14]  Antonio Torralba,et al.  Recognizing indoor scenes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Krista A. Ehinger,et al.  SUN database: Large-scale scene recognition from abbey to zoo , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[17]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[18]  Brendan J. Frey,et al.  Probabilistic n-Choose-k Models for Classification and Ranking , 2012, NIPS.

[19]  C. V. Jawahar,et al.  Blocks That Shout: Distinctive Parts for Scene Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Jun Wu,et al.  Superpixel segmentation based structural scene recognition , 2013, MM '13.

[21]  Alexei A. Efros,et al.  Mid-level Visual Element Discovery as Discriminative Mode Seeking , 2013, NIPS.

[22]  Jean Ponce,et al.  Learning Discriminative Part Detectors for Image Classification and Cosegmentation , 2013, 2013 IEEE International Conference on Computer Vision.

[23]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[24]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[25]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[26]  Bernt Schiele,et al.  Scalable Multitask Representation Learning for Scene Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[28]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[29]  Rong Jin,et al.  Top Rank Optimization in Linear Time , 2014, NIPS.

[30]  Maya R. Gupta,et al.  Training highly multiclass classifiers , 2014, J. Mach. Learn. Res..

[31]  Jorma Laaksonen,et al.  Convolutional Network Features for Scene Recognition , 2014, ACM Multimedia.

[32]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[33]  Michael Patriksson,et al.  Algorithms for the continuous nonlinear resource allocation problem - New implementations and numerical studies , 2015, Eur. J. Oper. Res..

[34]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.