Sparse Modeling with Applications to Speech Processing: A Survey

Nowadays, there has been a growing interest in the study of sparse approximation of signals. Using an over-complete dictionary consisting of prototype signals or atoms, signals are described by sparse linear combinations of these atoms. Applications that use sparse representation are many and include compression, source separation, enhancement, and regularization in inverse problems, feature extraction, and more. This article introduces a literature review of sparse coding applications in the field of speech processing.

[1]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[2]  Shrikanth Narayanan,et al.  Enhanced Sparse Imputation Techniques for a Robust Speech Recognition Front-End , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Minh N. Do,et al.  Ieee Transactions on Image Processing the Contourlet Transform: an Efficient Directional Multiresolution Image Representation , 2022 .

[4]  A. Bruckstein,et al.  On the uniqueness of overcomplete dictionaries, and a practical way to retrieve them , 2006 .

[5]  Terrence J. Sejnowski,et al.  Learning Overcomplete Representations , 2000, Neural Computation.

[6]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[7]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[8]  Hugo Van hamme,et al.  Unsupervised learning of time-frequency patches as a noise-robust representation of speech , 2009, Speech Commun..

[9]  Steven W. Zucker,et al.  Greedy Basis Pursuit , 2007, IEEE Transactions on Signal Processing.

[10]  Joachim M. Buhmann,et al.  Speech enhancement with sparse coding in learned dictionaries , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[11]  Hervé Bourlard,et al.  Sparse component analysis for speech recognition in multi-speaker environment , 2010, INTERSPEECH.

[12]  D. Donoho,et al.  Maximal Sparsity Representation via l 1 Minimization , 2002 .

[13]  Mark D. Plumbley Dictionary Learning for L1-Exact Sparse Coding , 2007, ICA.

[14]  Mark D. Plumbley,et al.  Speech denoising based on a greedy adaptive dictionary algorithm , 2009, 2009 17th European Signal Processing Conference.

[15]  Tuomas Virtanen,et al.  Non-negative matrix deconvolution in noise robust speech recognition , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Thomas Strohmer,et al.  GRASSMANNIAN FRAMES WITH APPLICATIONS TO CODING AND COMMUNICATION , 2003, math/0301135.

[17]  Barak A. Pearlmutter,et al.  Blind source separation by sparse decomposition , 2000, SPIE Defense + Commercial Sensing.

[18]  Richard Baraniuk,et al.  The Dual-tree Complex Wavelet Transform , 2007 .

[19]  Sridhar Krishna Nemala,et al.  Sparse coding for speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Joel A. Tropp,et al.  Topics in sparse approximation , 2004 .

[21]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[22]  Stéphane Mallat,et al.  Sparse geometric image representations with bandelets , 2005, IEEE Transactions on Image Processing.

[23]  Les E. Atlas,et al.  Single-channel source separation using simplified-training complex matrix factorization , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[24]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Xin Yang,et al.  Dual-channel noise reduction via sprase representations , 2012, 2012 IEEE 14th International Workshop on Multimedia Signal Processing (MMSP).

[26]  Olgica Milenkovic,et al.  Subspace Pursuit for Compressive Sensing Signal Reconstruction , 2008, IEEE Transactions on Information Theory.

[27]  Bert Cranen,et al.  Using sparse representations for missing data imputation in noise robust speech recognition , 2008, 2008 16th European Signal Processing Conference.

[28]  Tuomas Virtanen,et al.  Noise robust exemplar-based connected digit recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[29]  Mark D. Plumbley,et al.  Learning Incoherent Dictionaries for Sparse Approximation Using Iterative Projections and Rotations , 2013, IEEE Transactions on Signal Processing.

[30]  Wei Dai,et al.  Simultaneous codeword optimization (SimCO) for dictionary learning , 2011, 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[31]  Bhaskar D. Rao,et al.  Sparse Bayesian learning for basis selection , 2004, IEEE Transactions on Signal Processing.

[32]  Laurent Daudet,et al.  Sparse and structured decompositions of signals with the molecular matching pursuit , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[33]  Tuomas Virtanen,et al.  Exemplar-Based Sparse Representations for Noise Robust Automatic Speech Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Huan Wang,et al.  On the local correctness of ℓ1-minimization for dictionary learning , 2011, 2014 IEEE International Symposium on Information Theory.

[35]  Tuomas Virtanen,et al.  State-based labelling for a sparse representation of speech and its application to robust speech recognition , 2010, INTERSPEECH.

[36]  Michael Elad,et al.  Stable recovery of sparse overcomplete representations in the presence of noise , 2006, IEEE Transactions on Information Theory.

[37]  Jean-Jacques Fuchs,et al.  On sparse representations in arbitrary redundant bases , 2004, IEEE Transactions on Information Theory.

[38]  Mike E. Davies,et al.  Gradient Pursuits , 2008, IEEE Transactions on Signal Processing.

[39]  Tara N. Sainath,et al.  Bayesian compressive sensing for phonetic classification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[40]  Tao Xu,et al.  Methods for learning adaptive dictionary in underdetermined speech separation , 2011, 2011 IEEE International Workshop on Machine Learning for Signal Processing.

[41]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[42]  E. Candès,et al.  Curvelets: A Surprisingly Effective Nonadaptive Representation for Objects with Edges , 2000 .

[43]  Mark D. Plumbley,et al.  Sparse Coding for Convolutive Blind Audio Source Separation , 2006, ICA.

[44]  Deanna Needell,et al.  CoSaMP: Iterative signal recovery from incomplete and inaccurate samples , 2008, ArXiv.

[45]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[46]  Björn W. Schuller,et al.  Non-negative matrix factorization as noise-robust feature extractor for speech recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[47]  Bhaskar D. Rao,et al.  Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm , 1997, IEEE Trans. Signal Process..

[48]  Karin Schnass,et al.  Dictionary Identification—Sparse Matrix-Factorization via $\ell_1$ -Minimization , 2009, IEEE Transactions on Information Theory.

[49]  S. Mallat A wavelet tour of signal processing , 1998 .

[50]  Joel A. Tropp,et al.  Greed is good: algorithmic results for sparse approximation , 2004, IEEE Transactions on Information Theory.

[51]  Gaël Richard,et al.  Union of MDCT Bases for Audio Coding , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[52]  Joel A. Tropp,et al.  Just relax: convex programming methods for identifying sparse signals in noise , 2006, IEEE Transactions on Information Theory.

[53]  D. Donoho For most large underdetermined systems of equations, the minimal 𝓁1‐norm near‐solution approximates the sparsest near‐solution , 2006 .

[54]  Athanasios Mouchtaris,et al.  Speaker identification using sparsely excited speech signals and compressed sensing , 2010, 2010 18th European Signal Processing Conference.

[55]  Xiaoming Huo,et al.  Uncertainty principles and ideal atomic decomposition , 2001, IEEE Trans. Inf. Theory.

[56]  P. Földiák,et al.  Forming sparse representations by local anti-Hebbian learning , 1990, Biological Cybernetics.

[57]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.