Enhancement of noisy speech by temporal and spectral processing

This paper presents a noisy speech enhancement method by combining linear prediction (LP) residual weighting in the time domain and spectral processing in the frequency domain to provide better noise suppression as well as better enhancement in the speech regions. The noisy speech is initially processed by the excitation source (LP residual) based temporal processing that involves identifying and enhancing the excitation source based speech-specific features present at the gross and fine temporal levels. The gross level features are identified by estimating the following speech parameters: sum of the peaks in the discrete Fourier transform (DFT) spectrum, smoothed Hilbert envelope of the LP residual and modulation spectrum values, all from the noisy speech signal. The fine level features are identified using the knowledge of the instants of significant excitation. A weight function is derived from the gross and fine weight functions to obtain the temporally processed speech signal. The temporally processed speech is further subjected to spectral domain processing. Spectral processing involves estimation and removal of degrading components, and also identification and enhancement of speech-specific spectral components. The proposed method is evaluated using different objective and subjective quality measures. The quality measures show that the proposed combined temporal and spectral processing method provides better enhancement, compared to either temporal or spectral processing alone.

[1]  Keunsung Bae,et al.  Reduction of Musical Noise in Spectral Subtraction Method Using Subframe Phase Randomization , 1999 .

[2]  Rainer Martin,et al.  Speech enhancement based on minimum mean-square error estimation and supergaussian priors , 2005, IEEE Transactions on Speech and Audio Processing.

[3]  Yi Hu,et al.  Subjective comparison and evaluation of speech enhancement algorithms , 2007, Speech Commun..

[4]  S. Marple Computing the discrete-time 'analytic' signal via FFT , 1997 .

[5]  Sanjit K. Mitra,et al.  Multiple statistical models for soft decision in noisy speech enhancement , 2007, Pattern Recognit..

[6]  Steven Greenberg,et al.  The modulation spectrogram: in pursuit of an invariant representation of speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Yi Hu,et al.  Evaluation of objective measures for speech enhancement , 2006, INTERSPEECH.

[8]  Ephraim Speech enhancement using a minimum mean square error short-time spectral amplitude estimator , 1984 .

[9]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[10]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[11]  A.V. Oppenheim,et al.  Enhancement and bandwidth compression of noisy speech , 1979, Proceedings of the IEEE.

[12]  Philipos C. Loizou,et al.  A multi-band spectral subtraction method for enhancing speech corrupted by colored noise , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[14]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Bayya Yegnanarayana,et al.  Determination of instants of significant excitation in speech using group delay function , 1995, IEEE Trans. Speech Audio Process..

[16]  Philipos C. Loizou,et al.  Speech Enhancement: Theory and Practice , 2007 .

[17]  John H. L. Hansen,et al.  Discrete-Time Processing of Speech Signals , 1993 .

[18]  Bayya Yegnanarayana,et al.  Event-Based Instantaneous Fundamental Frequency Estimation From Speech Signals , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[20]  Birger Kollmeier,et al.  Speech pause detection for noise spectrum estimation by tracking power envelope dynamics , 2002, IEEE Trans. Speech Audio Process..

[21]  S. R. Mahadeva Prasanna,et al.  Speech enhancement using excitation source information , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Goutam Saha,et al.  Speech enhancement by joint statistical characterization in the Log Gabor Wavelet domain , 2008, Speech Commun..

[23]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[24]  K. Yamashita,et al.  Nonstationary noise estimation using low-frequency regions for spectral subtraction , 2005, IEEE Signal Processing Letters.

[25]  Hanseok Ko,et al.  Spectral subtraction based on phonetic dependency and masking effects , 2000 .

[26]  William H. Press,et al.  Numerical recipes in C (2nd ed.): the art of scientific computing , 1992 .

[27]  P. Krishnamoorthy,et al.  Reverberant Speech Enhancement by Temporal and Spectral Processing , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[28]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[29]  Ching-Ta Lu Reduction of musical residual noise for speech enhancement using masking properties and optimal smoothing , 2007, Pattern Recognit. Lett..

[30]  Hynek Hermansky,et al.  Speech enhancement using linear prediction residual , 1999, Speech Commun..

[31]  Bin Chen,et al.  Speech enhancement using a MMSE short time spectral amplitude estimator with Laplacian speech modeling , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[32]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[33]  David L. Donoho,et al.  De-noising by soft-thresholding , 1995, IEEE Trans. Inf. Theory.

[34]  Antony William Rix,et al.  Perceptual evaluation of speech quality (PESQ): The new ITU standard for end-to-end speech quality a , 2002 .

[35]  Biing-Hwang Juang,et al.  Auditory perception and cognition , 2008, IEEE Signal Processing Magazine.

[36]  S. Prasanna,et al.  Temporal and Spectral Processing of Degraded Speech , 2008, 2008 16th International Conference on Advanced Computing and Communications.

[37]  S. R. Mahadeva Prasanna,et al.  Extraction of pitch in adverse conditions , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38]  Wen Jin,et al.  Speech enhancement by residual domain constrained optimization , 2006, Speech Commun..

[39]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[40]  Chip-Hong Chang,et al.  A Generalized Time–Frequency Subtraction Method for Robust Speech Enhancement Based on Wavelet Filter Banks Modeling of Human Auditory System , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[41]  B. Yegnanarayana,et al.  Epoch extraction from linear prediction residual for identification of closed glottis interval , 1979 .

[42]  Bin Chen,et al.  A Laplacian-based MMSE estimator for speech enhancement , 2007, Speech Commun..

[43]  John G. Proakis,et al.  Digital Signal Processing: Principles, Algorithms, and Applications , 1992 .

[44]  Bayya Yegnanarayana,et al.  Voice activity detection in degraded speech using excitation source information , 2007, INTERSPEECH.

[45]  S. R. Mahadeva Prasanna,et al.  Finding Pitch Markers using First Order Gaussian Differentiator , 2005, 2005 3rd International Conference on Intelligent Sensing and Information Processing.

[46]  Q. Fu,et al.  Spectral subtraction-based speech enhancement for cochlear implant patients in background noise. , 2005, The Journal of the Acoustical Society of America.

[47]  Manfred R. Schroeder Parameter estimation in speech: A lesson in unorthodoxy , 1970 .

[48]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .