A Robust Time-Frequency Decomposition Model for Suppression of Mixed Gaussian-Impulse Noise in Audio Signals

In this paper, we propose a robust time-frequency decomposition (RTFD) model to restore audio signals degraded by sparse impulse noise mixed with small dense Gaussian noise. This kind of noise is very common especially in old-time recordings. The proposed RTFD model is based on the observation that these degraded audio signals mainly contain four parts, i.e., the quasi-periodic and voiced part, the aperiodic and transient part, the arbitrarily large impulse noise and the small dense Gaussian noise. Sparsity and local correlations of corresponding parts are exploited to solve the RTFD model. We also heuristically develop a discriminative orthogonal matching pursuit (DOMP) algorithm to more precisely estimate sparse representing vectors. Specifically, the DOMP algorithm divides the whole atom set into two subsets, i.e., the active subset and the passive subset. Atoms in two subsets are treated discriminatively since sparsity regularization terms are not equally weighted. Based on RTFD and DOMP, we have developed two algorithms, i.e., the fidelity-oriented algorithm and the articulation-oriented algorithm. The proposed algorithms achieve considerable performance on both synthetic and real noisy signals. Results show that the articulation-oriented algorithm using DOMP obviously outperforms other algorithms in heavier impulse noise situations.

[1]  Trac D. Tran,et al.  Exact Recoverability From Dense Corrupted Observations via $\ell _{1}$-Minimization , 2011, IEEE Transactions on Information Theory.

[2]  M. Elad,et al.  $rm K$-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation , 2006, IEEE Transactions on Signal Processing.

[3]  Balas K. Natarajan,et al.  Sparse Approximate Solutions to Linear Systems , 1995, SIAM J. Comput..

[4]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[5]  W. Etter,et al.  Restoration of a discrete-time signal segment by interpolation based on the left-sided and right-sided autoregressive parameters , 1996, IEEE Trans. Signal Process..

[6]  Oh-Wook Kwon,et al.  Single-channel speech separation using phase-based methods , 2010, IEEE Transactions on Consumer Electronics.

[7]  Soumya Jana,et al.  SIGNAL DETECTION AND ESTIMATION , 2002 .

[8]  Stéphane Mallat,et al.  Matching pursuits with time-frequency dictionaries , 1993, IEEE Trans. Signal Process..

[9]  Michael Elad,et al.  L1-L2 Optimization in Signal and Image Processing , 2010, IEEE Signal Processing Magazine.

[10]  Raymond N. J. Veldhuis,et al.  Adaptive interpolation of discrete-time signals that can be modeled as autoregressive processes , 1986, IEEE Trans. Acoust. Speech Signal Process..

[11]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[12]  R. DeVore,et al.  A Simple Proof of the Restricted Isometry Property for Random Matrices , 2008 .

[13]  Michael A. Saunders,et al.  Atomic Decomposition by Basis Pursuit , 1998, SIAM J. Sci. Comput..

[14]  Andreas Peter Burg,et al.  Sparsity-based real-time audio restoration , 2012, Proceedings of the 2012 Conference on Design and Architectures for Signal and Image Processing.

[15]  Simon J. Godsill,et al.  Compressed sensing and sparse filtering , 2013 .

[16]  Tomi Kinnunen,et al.  A Joint Approach for Single-Channel Speaker Identification and Speech Separation , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Peter J. W. Rayner,et al.  Digital Audio Restoration: A Statistical Model Based Approach , 1998 .

[18]  Michael Elad,et al.  A constrained matching pursuit approach to audio declipping , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  I. Kauppinen,et al.  Methods for detecting impulsive noise in speech and audio signals , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[20]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[21]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Y. Matsuyama,et al.  Similar-image retrieval systems using ICA and PCA bases , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[23]  Luiz W. P. Biscainho,et al.  A double-threshold-based approach to impulsive noise detection in audio signals , 2000, 2000 10th European Signal Processing Conference.

[24]  Simon J. Godsill,et al.  Compressed Sensing & Sparse Filtering , 2013 .

[25]  Mark D. Plumbley,et al.  Fast Dictionary Learning for Sparse Representations of Speech Signals , 2011, IEEE Journal of Selected Topics in Signal Processing.

[26]  Helmut Bölcskei,et al.  Sparse signal recovery from sparsely corrupted measurements , 2011, 2011 IEEE International Symposium on Information Theory Proceedings.

[27]  C.-C. Jay Kuo,et al.  Sparse Music Representation With Source-Specific Dictionaries and Its Application to Signal Separation , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  Simon J. Godsill,et al.  A Bayesian approach to the restoration of degraded audio signals , 1995, IEEE Trans. Speech Audio Process..

[29]  A. Bruckstein,et al.  K-SVD : An Algorithm for Designing of Overcomplete Dictionaries for Sparse Representation , 2005 .

[30]  Y. C. Pati,et al.  Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition , 1993, Proceedings of 27th Asilomar Conference on Signals, Systems and Computers.

[31]  L. Atlas,et al.  Single-Channel Source Separation Using Complex Matrix Factorization , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[32]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[33]  J. A. Heinen,et al.  A spectral subtraction method for the enhancement of speech corrupted by nonwhite, nonstationary noise , 1995, Proceedings of IECON '95 - 21st Annual Conference on IEEE Industrial Electronics.

[34]  S. Mallat,et al.  Adaptive greedy approximations , 1997 .

[35]  Simon J. Godsill,et al.  A Bayesian approach to the detection and correction of error bursts in audio signals , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[36]  Michael Elad,et al.  Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Paris Smaragdis,et al.  Supervised and Unsupervised Speech Enhancement Using Nonnegative Matrix Factorization , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[38]  E. Candès The restricted isometry property and its implications for compressed sensing , 2008 .

[39]  S. Shankar Sastry,et al.  Generalized principal component analysis (GPCA) , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  Michael Elad,et al.  Efficient Implementation of the K-SVD Algorithm using Batch Orthogonal Matching Pursuit , 2008 .

[41]  Michael Elad,et al.  Audio Inpainting , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[42]  Helmut Bölcskei,et al.  Recovery of Sparsely Corrupted Signals , 2011, IEEE Transactions on Information Theory.

[43]  Ching-Chung Li,et al.  Enhancement of speech intelligibility using transients extracted by wavelet packets , 2009, 2009 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[44]  Michael P. Friedlander,et al.  Probing the Pareto Frontier for Basis Pursuit Solutions , 2008, SIAM J. Sci. Comput..

[45]  Zhongfu Ye,et al.  A Compressed Sensing Approach to Blind Separation of Speech Mixture Based on a Two-Layer Sparsity Model , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[46]  Gonzalo R. Arce,et al.  Nonlinear Signal Processing - A Statistical Approach , 2004 .

[47]  G. R. Arce,et al.  A multichannel weighted median filter for complex array signal processing , 2005 .

[48]  Emmanuel J. Candès,et al.  Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information , 2004, IEEE Transactions on Information Theory.

[49]  Michael Elad,et al.  Sparse and Redundant Representations - From Theory to Applications in Signal and Image Processing , 2010 .

[50]  Ali Taylan Cemgil,et al.  Single-Channel Speech-Music Separation for Robust ASR With Mixture Models , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[51]  Wei Zhang,et al.  Speech enhancement employing Laplacian-Gaussian mixture , 2005, IEEE Transactions on Speech and Audio Processing.

[52]  S. Frick,et al.  Compressed Sensing , 2014, Computer Vision, A Reference Guide.

[53]  Jean-Luc Starck,et al.  Sparse Solution of Underdetermined Systems of Linear Equations by Stagewise Orthogonal Matching Pursuit , 2012, IEEE Transactions on Information Theory.