Fast Dictionary Learning for Sparse Representations of Speech Signals

For dictionary-based decompositions of certain types, it has been observed that there might be a link between sparsity in the dictionary and sparsity in the decomposition. Sparsity in the dictionary has also been associated with the derivation of fast and efficient dictionary learning algorithms. Therefore, in this paper we present a greedy adaptive dictionary learning algorithm that sets out to find sparse atoms for speech signals. The algorithm learns the dictionary atoms on data frames taken from a speech signal. It iteratively extracts the data frame with minimum sparsity index, and adds this to the dictionary matrix. The contribution of this atom to the data frames is then removed, and the process is repeated. The algorithm is found to yield a sparse signal decomposition, supporting the hypothesis of a link between sparsity in the decomposition and dictionary. The algorithm is applied to the problem of speech representation and speech denoising, and its performance is compared to other existing methods. The method is shown to find dictionary atoms that are sparser than their time-domain waveform, and also to result in a sparser speech representation. In the presence of noise, the algorithm is found to have similar performance to the well established principal component analysis.

[1]  I F Gorodnitsky,et al.  Neuromagnetic source imaging with FOCUSS: a recursive weighted minimum norm algorithm. , 1995, Electroencephalography and clinical neurophysiology.

[2]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[3]  Rémi Gribonval Sparse decomposition of stereo signals with Matching Pursuit and application to blind separation of more than two sources from a stereo mixture , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[5]  Scott T. Rickard,et al.  Comparing Measures of Sparsity , 2008, IEEE Transactions on Information Theory.

[6]  Zhifeng Zhang,et al.  Adaptive time-frequency decompositions , 1994 .

[7]  S. Mallat,et al.  Adaptive time-frequency decomposition with matching pursuits , 1992, [1992] Proceedings of the IEEE-SP International Symposium on Time-Frequency and Time-Scale Analysis.

[8]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[9]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[10]  Licheng Jiao,et al.  New Evidences for Sparse Coding Strategy Employed in Visual Neurons: from the Image Processing and Nonlinear Approximation Viewpoint , 2005, ESANN.

[11]  Martin Vetterli,et al.  Atomic signal models based on recursive filter banks , 1997, Conference Record of the Thirty-First Asilomar Conference on Signals, Systems and Computers (Cat. No.97CB36136).

[12]  Emmanuel Vincent,et al.  An adaptive stereo basis method for convolutive blind audio source separation , 2008, Neurocomputing.

[13]  Mark D. Plumbley,et al.  An adaptive orthogonal sparsifying transform for speech signals , 2008, 2008 3rd International Symposium on Communications, Control and Signal Processing.

[14]  Mark D. Plumbley Dictionary Learning for L1-Exact Sparse Coding , 2007, ICA.

[15]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[16]  Mark D. Plumbley,et al.  Fixed points of dictionary learning algorithms for sparse representations , 2013 .

[17]  C. Févotte,et al.  A STUDY OF THE EFFECT OF SOURCE SPARSITY FOR VARIOUS TRANSFORMS ON BLIND AUDIO SOURCE SEPARATION PERFORMANCE , 2005 .

[18]  Christian Jutten,et al.  Image Denoising Using Sparse Representations , 2009, ICA.

[19]  Michael Elad,et al.  Double Sparsity: Learning Sparse Dictionaries for Sparse Signal Approximation , 2010, IEEE Transactions on Signal Processing.

[20]  Laura Rebollo-Neira Dictionary redundancy elimination , 2004 .

[21]  Barak A. Pearlmutter,et al.  Blind Source Separation by Sparse Decomposition in a Signal Dictionary , 2001, Neural Computation.

[22]  Joseph F. Murray,et al.  Dictionary Learning Algorithms for Sparse Representation , 2003, Neural Computation.

[23]  P. Földiák,et al.  Forming sparse representations by local anti-Hebbian learning , 1990, Biological Cybernetics.

[24]  Seungjin Choi,et al.  Independent Component Analysis , 2009, Handbook of Natural Computing.

[25]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[26]  R. Gribonval,et al.  Some recovery conditions for basis learning by L1-minimization , 2008, 2008 3rd International Symposium on Communications, Control and Signal Processing.

[27]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[28]  W. Bastiaan Kleijn,et al.  Encoding speech using prototype waveforms , 1993, IEEE Trans. Speech Audio Process..

[29]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[30]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..