Maximum-margin sparse coding

This work devises a maximum-margin sparse coding algorithm, jointly considering reconstruction loss and hinge loss in the model. The sparse representation along with maximum-margin constraint is analogous to kernel trick and maximum-margin properties of support vector machine (SVM), giving a base for the proposed algorithm to perform well in classification tasks. The key idea behind the proposed method is to use labeled and unlabeled data to learn discriminative representations and model parameters simultaneously, making it easier to classify data in the new space. We propose to use block coordinate descent to learn all the components of the proposed model and give detailed derivation for the update rules of the model variables. Theoretical analysis on the convergence of the proposed MMSC algorithm is provided based on Zangwills global convergence theorem. Additionally, most previous research studies on dictionary learning suggest to use an overcomplete dictionary to improve classification performance, but it is computationally intensive when the dimension of the input data is huge. We conduct experiments on several real data sets, including Extended YaleB, AR face, and Caltech101 data sets. The experimental results indicate that the proposed algorithm outperforms other comparison algorithms without an overcomplete dictionary, providing flexibility to deal with high-dimensional data sets.

[1]  Donghui Wang,et al.  A Dictionary Learning Approach for Classification: Separating the Particularity and the Commonality , 2012, ECCV.

[2]  Chih-Jen Lin,et al.  A Comparison of Optimization Methods and Software for Large-scale L1-regularized Linear Classification , 2010, J. Mach. Learn. Res..

[3]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[4]  David Zhang,et al.  Fisher Discrimination Dictionary Learning for sparse representation , 2011, 2011 International Conference on Computer Vision.

[5]  Larry S. Davis,et al.  Submodular dictionary learning for sparse coding , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[7]  Kim-Chuan Toh,et al.  A coordinate gradient descent method for ℓ1-regularized convex minimization , 2011, Comput. Optim. Appl..

[8]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Changyin Sun,et al.  Discriminative low-rank dictionary learning for face recognition , 2016, Neurocomputing.

[10]  Chih-Jen Lin,et al.  Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines , 2008, J. Mach. Learn. Res..

[11]  Jingdong Wang,et al.  Online Robust Non-negative Dictionary Learning for Visual Tracking , 2013, 2013 IEEE International Conference on Computer Vision.

[12]  P. Tseng Convergence of a Block Coordinate Descent Method for Nondifferentiable Minimization , 2001 .

[13]  Gilad Lerman,et al.  Robust Locally Linear Analysis with Applications to Image Denoising and Blind Inpainting , 2013, SIAM J. Imaging Sci..

[14]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[15]  Baoxin Li,et al.  Discriminative K-SVD for dictionary learning in face recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Jing Xiao,et al.  Non-negative matrix factorization as a feature selection tool for maximum margin classifiers , 2011, CVPR 2011.

[18]  Mubarak Shah,et al.  Action MACH a spatio-temporal Maximum Average Correlation Height filter for action recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Keke Chen,et al.  Privacy preserving data classification with rotation perturbation , 2005, Fifth IEEE International Conference on Data Mining (ICDM'05).

[20]  Ping-Keng Jao,et al.  Analyzing the dictionary properties and sparsity constraints for a dictionary-based music genre classification system , 2013, 2013 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference.

[21]  E. M. L. Beale,et al.  Nonlinear Programming: A Unified Approach. , 1970 .

[22]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[23]  Luc Van Gool,et al.  Latent Dictionary Learning for Sparse Representation Based Classification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Lorenzo Rosasco,et al.  Are Loss Functions All the Same? , 2004, Neural Computation.

[25]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[26]  Ambuj Tewari,et al.  Stochastic methods for l1 regularized loss minimization , 2009, ICML '09.

[27]  Michael Elad,et al.  Learning Multiscale Sparse Representations for Image and Video Restoration , 2007, Multiscale Model. Simul..

[28]  Zhiwei Li,et al.  Max-Margin Dictionary Learning for Multiclass Image Categorization , 2010, ECCV.

[29]  Shutao Li,et al.  Multi-morphology image super-resolution via sparse representation , 2013, Neurocomputing.

[30]  Jason J. Corso,et al.  Action bank: A high-level representation of activity in video , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  A. Martínez,et al.  The AR face databasae , 1998 .

[32]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[33]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[34]  Guillermo Sapiro,et al.  Classification and clustering via dictionary learning with structured incoherence and shared features , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[35]  Samy Bengio,et al.  Large Scale Online Learning of Image Similarity Through Ranking , 2009, J. Mach. Learn. Res..

[36]  Larry S. Davis,et al.  Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Ming-Syan Chen,et al.  On the Design and Analysis of the Privacy-Preserving SVM Classifier , 2011, IEEE Transactions on Knowledge and Data Engineering.

[38]  Thomas S. Huang,et al.  A Max-Margin Perspective on Sparse Representation-Based Classification , 2013, 2013 IEEE International Conference on Computer Vision.

[39]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[40]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[41]  Holger Boche,et al.  Sparse Signal Processing Concepts for Efficient 5G System Design , 2014, IEEE Access.

[42]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[43]  Guillermo Sapiro,et al.  Supervised Dictionary Learning , 2008, NIPS.

[44]  Taneli Mielikäinen,et al.  Cryptographically private support vector machines , 2006, KDD '06.

[45]  Manya Raman-Sundström,et al.  A Pedagogical History of Compactness , 2010, Am. Math. Mon..

[46]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[47]  New York Dover,et al.  ON THE CONVERGENCE PROPERTIES OF THE EM ALGORITHM , 1983 .

[48]  Qiuqi Ruan,et al.  Facial expression recognition using sparse local Fisher discriminant analysis , 2016, Neurocomputing.

[49]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[50]  Rama Chellappa,et al.  Sparse Representations and Compressive Sensing for Imaging and Vision , 2013, Springer Briefs in Electrical and Computer Engineering.

[51]  Zhenyu Lu,et al.  Face recognition algorithm based on discriminative dictionary learning and sparse representation , 2016, Neurocomputing.

[52]  Chih-Jen Lin,et al.  A Study on L2-Loss (Squared Hinge-Loss) Multiclass SVM , 2013, Neural Computation.

[53]  Bernhard Schölkopf,et al.  Use of the Zero-Norm with Linear Models and Kernel Methods , 2003, J. Mach. Learn. Res..

[54]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[55]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[56]  Haitao Yu,et al.  Graph Regularized Sparsity Discriminant Analysis for face recognition , 2016, Neurocomputing.

[57]  Xindong Wu,et al.  Graph-Based Learning via Auto-Grouped Sparse Regularization and Kernelized Extension , 2015, IEEE Transactions on Knowledge and Data Engineering.

[58]  G. Dantzig,et al.  On the continuity of the minimum set of a continuous function , 1967 .

[59]  Joseph K. Bradley,et al.  Parallel Coordinate Descent for L1-Regularized Loss Minimization , 2011, ICML.