Smoothed SimCO for dictionary learning: Handling the singularity issue

Typical algorithms for dictionary learning iteratively perform two steps: sparse approximation and dictionary update. This paper focuses on the latter. While various algorithms have been proposed for dictionary update, the global optimality is generally not guaranteed. Interestingly, the main reason for an optimization procedure not converging to a global optimum is not local minima or saddle points but singular points where the objective function is not continuous. To address the singularity issue, we propose the so called smoothed SimCO, where the original objective function is replaced with a continuous counterpart. It can be proved that in the limit case, the new objective function is the best possible lower semi-continuous approximation of the original one. A Newton CG method is implemented to solve the corresponding optimization problem. Simulations demonstrate the proposed method significantly improves the performance.

[1]  Wei Dai,et al.  Simultaneous Codeword Optimization (SimCO) for Dictionary Update and Learning , 2011, IEEE Transactions on Signal Processing.

[2]  Olgica Milenkovic,et al.  Subspace Pursuit for Compressive Sensing Signal Reconstruction , 2008, IEEE Transactions on Information Theory.

[3]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[4]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[5]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[6]  Mike E. Davies,et al.  Dictionary Learning for Sparse Approximations With the Majorization Method , 2009, IEEE Transactions on Signal Processing.

[7]  Emmanuel J. Candès,et al.  Decoding by linear programming , 2005, IEEE Transactions on Information Theory.

[8]  Mark D. Plumbley,et al.  Dictionary Learning with Large Step Gradient Descent for Sparse Representations , 2012, LVA/ICA.

[9]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[10]  Alan Edelman,et al.  The Geometry of Algorithms with Orthogonality Constraints , 1998, SIAM J. Matrix Anal. Appl..

[11]  Kjersti Engan,et al.  Method of optimal directions for frame design , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).