Optimization of learned dictionary for sparse coding in speech processing

As a promising technique, sparse coding has been widely used for the analysis, representation, compression, denoising and separation of speech. This technique needs a good dictionary which contains atoms to represent speech signals. Although many methods have been proposed to learn such a dictionary, there are still two problems. First, unimportant atoms bring a heavy computational load to sparse decomposition and reconstruction, which prevents sparse coding from real-time application. Second, in speech denoising and separation, harmful atoms have no or ignorable contributions to reducing the sparsity degree but increase the source confusion, resulting in severe distortions. To solve these two problems, we first analyze the inherent assumptions of sparse coding and show that distortion can be caused if the assumptions do not hold true. Next, we propose two methods to optimize a given dictionary by removing unimportant atoms and harmful atoms, respectively. Experiments show that the proposed methods can further improve the performance of dictionaries. HighlightsAnalyze the assumptions of sparse coding.Analyze the distortion of reconstructed signals in theory.Propose two optimization methods which improve a given dictionary by atom selection, rather than providing an improved method.Present several measures for dictionary evaluation.

[1]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[2]  Yonina C. Eldar,et al.  Dictionary Optimization for Block-Sparse Representations , 2010, IEEE Transactions on Signal Processing.

[3]  Paul D. Gader,et al.  EK-SVD: Optimized dictionary design for sparse representations , 2008, 2008 19th International Conference on Pattern Recognition.

[4]  Shi-Wen Deng,et al.  Statistical voice activity detection based on sparse representation over learned dictionary , 2013, Digit. Signal Process..

[5]  Y. Iiguni,et al.  Single-channel speech separation by using a sparse decomposition with periodic structure , 2009, 2008 International Symposium on Intelligent Signal Processing and Communications Systems.

[6]  Dong Yu,et al.  Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Joachim M. Buhmann,et al.  Learning Dictionaries With Bounded Self-Coherence , 2012, IEEE Signal Processing Letters.

[8]  Hugo Van hamme,et al.  Embedding time warping in exemplar-based sparse representations of speech , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  Hugo Van hamme,et al.  Compressive Sensing for Missing Data Imputation in Noise Robust Speech Recognition , 2010, IEEE Journal of Selected Topics in Signal Processing.

[10]  E. Candès,et al.  Curvelets: A Surprisingly Effective Nonadaptive Representation for Objects with Edges , 2000 .

[11]  Li Shang,et al.  Immune K-SVD algorithm for dictionary learning in speech denoising , 2014, Neurocomputing.

[12]  Alain Rakotomamonjy,et al.  Direct Optimization of the Dictionary Learning Problem , 2013, IEEE Transactions on Signal Processing.

[13]  Mohamed-Jalal Fadili,et al.  Morphological Component Analysis: An Adaptive Thresholding Strategy , 2007, IEEE Transactions on Image Processing.

[14]  Joachim M. Buhmann,et al.  Speech enhancement with sparse coding in learned dictionaries , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[16]  Deyu Meng,et al.  Learning dictionary from signals under global sparsity constraint , 2013, Neurocomputing.

[17]  Michael Elad,et al.  On the Role of Sparse and Redundant Representations in Image Processing , 2010, Proceedings of the IEEE.

[18]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[19]  Michael Elad,et al.  K-SVD dictionary-learning for the analysis sparse model , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Georg Heigold,et al.  Speech recognition with state-based nearest neighbour classifiers , 2007, INTERSPEECH.

[21]  D. Donoho For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[22]  Tuomas Virtanen,et al.  Monaural Sound Source Separation by Nonnegative Matrix Factorization With Temporal Continuity and Sparseness Criteria , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[23]  S. Mallat A wavelet tour of signal processing , 1998 .

[24]  Chi Fang,et al.  Generalized joint kernel regression and adaptive dictionary learning for single-image super-resolution , 2014, Signal Process..

[25]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[26]  David L Donoho,et al.  Compressed sensing , 2006, IEEE Transactions on Information Theory.

[27]  Michael Elad,et al.  Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[28]  Denis Jouvet,et al.  Evaluation of a noise-robust DSR front-end on Aurora databases , 2002, INTERSPEECH.

[29]  Mikael Skoglund,et al.  Projection-Based and Look-Ahead Strategies for Atom Selection , 2011, IEEE Transactions on Signal Processing.

[30]  Donghui Wang,et al.  A classification-oriented dictionary learning model: Explicitly learning the particularity and commonality across categories , 2014, Pattern Recognit..

[31]  Nanning Zheng,et al.  Learning group-based dictionaries for discriminative image representation , 2014, Pattern Recognit..

[32]  Mohammed Bennamoun,et al.  Sparse Representation for Speaker Identification , 2010, 2010 20th International Conference on Pattern Recognition.

[33]  Michael Elad,et al.  Sparse Coding with Anomaly Detection , 2013, Journal of Signal Processing Systems.

[34]  Yunde Jia,et al.  Voice activity detection using convolutive non-negative sparse coding , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[35]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[36]  Stphane Mallat,et al.  A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way , 2008 .

[37]  Yang Gao,et al.  Bilinear discriminative dictionary learning for face recognition , 2014, Pattern Recognit..

[38]  Zhongfu Ye,et al.  Single-channel speech separation using sequential discriminative dictionary learning , 2015, Signal Process..

[39]  Tuomas Virtanen,et al.  Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[40]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[41]  Kjersti Engan,et al.  Recursive Least Squares Dictionary Learning Algorithm , 2010, IEEE Transactions on Signal Processing.

[42]  Youji Iiguni,et al.  Single-channel speech separation using a sparse periodic decomposition , 2009, 2009 17th European Signal Processing Conference.

[43]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[44]  VetterliM.,et al.  The contourlet transform , 2005 .

[45]  S. Laughlin,et al.  An Energy Budget for Signaling in the Grey Matter of the Brain , 2001, Journal of cerebral blood flow and metabolism : official journal of the International Society of Cerebral Blood Flow and Metabolism.

[46]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[47]  Patrick Wambacq,et al.  Template-Based Continuous Speech Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[48]  Yonina C. Eldar,et al.  Introduction to Compressed Sensing , 2022 .

[49]  Kjersti Engan,et al.  Method of optimal directions for frame design , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[50]  Tara N. Sainath,et al.  Optimization Techniques to Improve Training Speed of Deep Neural Networks for Large Speech Tasks , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[51]  Jiqing Han,et al.  A solution to residual noise in speech denoising with sparse representation , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[52]  E. Ambikairajah,et al.  Speaker verification using sparse representation classification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[53]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[54]  Michael Elad,et al.  Analysis K-SVD: A Dictionary-Learning Algorithm for the Analysis Sparse Model , 2013, IEEE Transactions on Signal Processing.

[55]  Bert Cranen,et al.  Sparse imputation for large vocabulary noise robust ASR , 2011, Comput. Speech Lang..

[56]  Fei Zhou,et al.  Feature Denoising Using Joint Sparse Representation for In-Car Speech Recognition , 2013, IEEE Signal Processing Letters.

[57]  Shrikanth Narayanan,et al.  Enhanced Sparse Imputation Techniques for a Robust Speech Recognition Front-End , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[58]  Minh N. Do,et al.  Ieee Transactions on Image Processing the Contourlet Transform: an Efficient Directional Multiresolution Image Representation , 2022 .

[59]  Joseph Picone,et al.  Applications of support vector machines to speech recognition , 2004, IEEE Transactions on Signal Processing.

[60]  Yongqiang Wang,et al.  An investigation of deep neural networks for noise robust speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[61]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[62]  Haibo He,et al.  An integrated incremental self-organizing map and hierarchical neural network approach for cognitive radio learning , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[63]  Longbiao Wang,et al.  Joint sparse representation based cepstral-domain dereverberation for distant-talking speech recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[64]  Fatih Murat Porikli,et al.  A clustering approach to optimize online dictionary learning , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[65]  Michael Elad,et al.  From Sparse Solutions of Systems of Equations to Sparse Modeling of Signals and Images , 2009, SIAM Rev..

[66]  J. Bobin,et al.  Morphological component analysis , 2005, SPIE Optics + Photonics.

[67]  Thomas S. Huang,et al.  Spatial–Spectral Classification of Hyperspectral Images Using Discriminative Dictionary Designed by Learning Vector Quantization , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[68]  Wenyin Liu,et al.  HEp-2 cell pattern classification with discriminative dictionary learning , 2014, Pattern Recognit..

[69]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[70]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[71]  Wen-Liang Hwang,et al.  A proximal method for the K-SVD dictionary learning , 2013, 2013 IEEE International Workshop on Machine Learning for Signal Processing (MLSP).

[72]  Jean Ponce,et al.  Task-Driven Dictionary Learning , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Christian Jutten,et al.  Learning Overcomplete Dictionaries Based on Atom-by-Atom Updating , 2014, IEEE Transactions on Signal Processing.