Large scale multi-class classification with truncated nuclear norm regularization

Abstract In this paper, we consider the problem of multi-class image classification when the classes behaviour has a low rank structure. That is, classes can be embedded into a low dimensional space. Traditional multi-class classification algorithms usually use nuclear norm to approximate the rank of the weight matrix. Considering the limited ability of the nuclear norm for the accurate approximation, we propose a new scalable large scale multi-class classification algorithm by using the recently proposed truncated nuclear norm as a better surrogate of the rank operator of matrices along with multinomial logisitic loss. To solve the non-convex and non-smooth optimization problem, we further develop an efficient iterative procedure. In each iteration, by lifting the non-smooth convex subproblem into an infinite dimensional l 1 norm regularized problem, a simple and efficient accelerated coordinate descent algorithm is applied to find the optimal solution. We conduct a series of evaluations on several public large scale image datasets, where the experimental results show the encouraging improvement of classification accuracy of the proposed algorithm in comparison with the state-of-the-art multi-class classification algorithms.

[1]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[2]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[3]  Paul W. Fieguth,et al.  Texture Classification from Random Features , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Zheng Yang,et al.  Locality-Constrained Concept Factorization , 2011, IJCAI.

[5]  Xuelong Li,et al.  Matrix completion by Truncated Nuclear Norm Regularization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Florent Perronnin,et al.  High-dimensional signature compression for large-scale image classification , 2011, CVPR 2011.

[7]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[8]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[9]  Cordelia Schmid,et al.  Towards good practice in large-scale learning for image classification , 2012, CVPR.

[10]  Xuelong Li,et al.  A-Optimal Non-negative Projection for image representation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Kim-Chuan Toh,et al.  SDPT3 — a Matlab software package for semidefinite-quadratic-linear programming, version 3.0 , 2001 .

[12]  Sebastian Nowozin,et al.  On feature combination for multiclass object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Haifeng Liu,et al.  Non-Negative Matrix Factorization with Constraints , 2010, AAAI.

[15]  Bernt Schiele,et al.  Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.

[16]  James T. Kwok,et al.  Making Large-Scale Nyström Approximation Possible , 2010, ICML.

[17]  K. Chen,et al.  Matrix preconditioning techniques and applications , 2005 .

[18]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.

[19]  Ruslan Salakhutdinov,et al.  Practical Large-Scale Optimization for Max-norm Regularization , 2010, NIPS.

[20]  Zaïd Harchaoui,et al.  Lifted coordinate descent for learning with trace-norm regularization , 2012, AISTATS.

[21]  Tal Hassner,et al.  Effective Unconstrained Face Recognition by Combining Multiple Descriptors and Learned Background Statistics , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[23]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[24]  Matthijs Douze,et al.  Large-scale image classification with trace-norm regularization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[26]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[27]  Shimon Ullman,et al.  Uncovering shared structures in multiclass classification , 2007, ICML '07.

[28]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[29]  Xuelong Li,et al.  Local Coordinate Concept Factorization for Image Representation , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Patrick Gallinari,et al.  Ranking with ordered weighted pairwise classification , 2009, ICML '09.