Efficient Kernel Sparse Coding Via First-Order Smooth Optimization

We consider the problem of dictionary learning and sparse coding, where the task is to find a concise set of basis vectors that accurately represent the observation data with only small numbers of active bases. Typically formulated as an L1-regularized least-squares problem, the problem incurs computational difficulty originating from the nondifferentiable objective. Recent approaches to sparse coding thus have mainly focused on acceleration of the learning algorithm. In this paper, we propose an even more efficient and scalable sparse coding algorithm based on the first-order smooth optimization technique. The algorithm finds the theoretically guaranteed optimal sparse codes of the epsilon-approximate problem in a series of optimization subproblems, where each subproblem admits analytic solution, hence very fast and scalable with large-scale data. We further extend it to nonlinear sparse coding using kernel trick by showing that the representer theorem holds for the kernel sparse coding problem. This allows us to apply dual optimization, which essentially results in the same linear sparse coding problem in dual variables, highly beneficial compared with the existing methods that suffer from local minima and restricted forms of kernel function. The efficiency of our algorithms is demonstrated for natural stimuli data sets and several image classification problems.

[1]  Bruno A Olshausen,et al.  Sparse coding of sensory inputs , 2004, Current Opinion in Neurobiology.

[2]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[3]  G. Wahba,et al.  A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines , 1970 .

[4]  Shuicheng Yan,et al.  Auto-Grouped Sparse Representation for Visual Analysis , 2012, ECCV.

[5]  Stephen J. Wright Primal-Dual Interior-Point Methods , 1997, Other Titles in Applied Mathematics.

[6]  Allen Y. Yang,et al.  Robust Face Recognition via Sparse Representation , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Martin Szummer,et al.  Temporal texture modeling , 1996, Proceedings of 3rd IEEE International Conference on Image Processing.

[8]  James Theiler,et al.  Online Feature Selection using Grafting , 2003, ICML.

[9]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale $\ell_1$-Regularized Least Squares , 2007, IEEE Journal of Selected Topics in Signal Processing.

[10]  Alexandre d'Aspremont,et al.  First-Order Methods for Sparse Covariance Selection , 2006, SIAM J. Matrix Anal. Appl..

[11]  Liang-Tien Chia,et al.  Kernel Sparse Representation for Image Classification and Face Recognition , 2010, ECCV.

[12]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[13]  Jian Yang,et al.  Robust sparse coding for face recognition , 2011, CVPR 2011.

[14]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[15]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[16]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[17]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[18]  Andrzej Stachurski,et al.  Parallel Optimization: Theory, Algorithms and Applications , 2000, Parallel Distributed Comput. Pract..

[19]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[20]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[21]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[22]  David J. Kriegman,et al.  From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Stephen P. Boyd,et al.  Graph Implementations for Nonsmooth Convex Programs , 2008, Recent Advances in Learning and Control.

[24]  Saharon Rosset,et al.  Tracking Curved Regularized Optimization Solution Paths , 2004, NIPS 2004.

[25]  M. R. Osborne,et al.  A new approach to variable selection in least squares problems , 2000 .

[26]  V. Kshirsagar,et al.  Face recognition using Eigenfaces , 2011, 2011 3rd International Conference on Computer Research and Development.

[27]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[28]  Bruno A. Olshausen,et al.  Sparse Coding Of Time-Varying Natural Images , 2010 .

[29]  Shin'ichi Satoh,et al.  Generalized Lasso based Approximation of Sparse Coding for Visual Recognition , 2011, NIPS.

[30]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[31]  Yinyu Ye,et al.  Interior point algorithms: theory and analysis , 1997 .

[32]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[33]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[34]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[35]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[36]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[37]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[38]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[39]  Claude Lemaréchal,et al.  Practical Aspects of the Moreau-Yosida Regularization: Theoretical Preliminaries , 1997, SIAM J. Optim..

[40]  Krishnakumar Balasubramanian,et al.  Smooth sparse coding via marginal regression for learning sparse representations , 2012, Artif. Intell..

[41]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[42]  Jitendra Malik,et al.  Representing and Recognizing the Visual Appearance of Materials using Three-dimensional Textons , 2001, International Journal of Computer Vision.

[43]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[44]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[45]  Allen Y. Yang,et al.  Fast L1-Minimization Algorithms For Robust Face Recognition , 2010, 1007.3753.