Sparse Coding

Sparse modelling calls for constructing efficient representations of data as a combination of a few typical patterns (atoms) learned from the data itself. Significant contributions to the theory and practice of learning such collections of atoms (usually called Dictionaries), and of representing the actual data in terms of them, have been made thereby leading to state-of-the-art results in many signal and image processing and data analysis tasks. Sparse Coding is the process of computing the representation coefficients x based on the given signal y and the given dictionary D. Exact determination of sparsest representations proves to be an NP-hard problem [1]. This report briefly describes some of the approaches in this area, ranging from greedy algorithms to l1-optimization all the way to simultaneous learning of adaptive dictionaries and the corresponding representation vector. 1. Problem Statement: Using a dictionary 1 matrix D ∈ R that contains k atoms, dj j=1 k as its columns, a signal y (∈ R) can be represented as a sparse linear combination of these atoms, the solution of which may either be exact (y=Dx) or approximate (y≈Dx). The vector x (∈ R) expresses the representation coefficients of the signal y. The problem at hand is finding the sparsest representation, x which is the solution of either: minx x o subject to y = Dx (1) Or minx x o subject to y − Dx 2 ≤ ε (2) where . o is the 1o norm, counting the nonzero entries of a vector. 2. Solution Approaches This section briefly describes few noted approaches to this problem, followed by detailed description of one of the prominent solution (K-SVD algorithm) in the next section. 2.1 Matching Pursuit Mallat[2] proposed a greedy solution which successively approximates y with orthogonal projections on elements of D. The vector y (∈ H,Hilbert Space) can be decomposed into y = < y, gγ0 > gγ0 + Ry 1 Note: The dictionary we refer to on this report is an Overcomplete Dictionary, with k>n. Where Ry is the residual vector after approximating y in the direction of gγ0 . gγ0 being orthogonal to Ry, hence y 2 = < y, gγ0 > 2 + Ry . To minimize Ry we must choose gγ0 ∈ D such that |< y, gγ0 > | is maximum. In some cases it is only possible to find gγ0 that is almost the best in the sense that < y, gγ0 > ≥ α supγ∈τ < y, gγ0 > where α is an optimality factor that satisfies 0≤α≤ 1. A matching pursuit is an iterative algorithm that sub-decomposes the residue Ry by projecting it on a vector of D that matches Ry at its best, as was done for y. This procedure is repeated each time on the following residue that is obtained. It has been shown that it performs better than DCT based coding for low bit rates in both efficiency of coding and quality of image. The main problem with Matching Pursuit is the computational complexity of the encoder. Improvements include the use of approximate dictionary representations and suboptimal ways of choosing the best match at each iteration (atom extraction). 2.2 Orthogonal Matching Pursuit (OMP) In Pati[3] , the authors propose a refinement of the Matching Pursuit (MP) algorithm which improves convergence using an additional orthogonalization step. As compared to MP, this method performs an additional computation of k th -order model for y, y = an xn k n=1 + Rky , with <Rk, xn> = 0, n=1...k. Since the elements of D are not required to be orthogonal, to perform such an update, an auxillary model for dependence of xk+1 on xk would be required, which is given by xk+1 = bn xn k n=1 + γk with <γk, xn> = 0 for n=1...k. For a finite dictionary with N elements, OMP is guaranteed to converge to the projection onto the span of the dictionary elements in a maximum of N steps.

[1]  Yann LeCun,et al.  Learning Fast Approximations of Sparse Coding , 2010, ICML.

[2]  Bruno A Olshausen,et al.  Sparse coding of sensory inputs , 2004, Current Opinion in Neurobiology.

[3]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[4]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[5]  Guillermo Sapiro,et al.  Discriminative learned dictionaries for local image analysis , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[7]  S. Mallat,et al.  Adaptive greedy approximations , 1997 .

[8]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[9]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..