Speech Enhancement Using Harmonic Emphasis and Adaptive Comb Filtering

An enhancement method for single-channel speech degraded by additive noise is proposed. A spectral weighting function is derived by constrained optimization to suppress noise in the frequency domain. Two design parameters are included in the suppression gain, namely, the frequency-dependent noise-flooring parameter (FDNFP) and the gain factor. The FDNFP controls the level of admissible residual noise in the enhanced speech. Enhanced harmonic structures are incorporated into the FDNFP by time-domain processing of the linear prediction residuals of voiced speech. Further enhancement of the harmonics is achieved by adaptive comb filtering derived using the gain factor with a peak-picking algorithm. The performance of the enhancement method was evaluated by the modified bark spectral distance (MBSD), ITU-Perceptual Evaluation of Speech Quality (PESQ) scores, composite objective measures and listening tests. Experimental results indicate that the proposed method outperforms spectral subtraction; a main signal subspace method applicable to both white and colored noise conditions and a perceptually based enhancement method with a constant noise-flooring parameter, particularly at lower signal-to-noise ratio conditions. Our listening test indicated that 16 listeners on average preferred the proposed approach over any of the other three approaches about 73% of the time.

[1]  M. Deisher,et al.  Speech enhancement using state-based estimation and sinusoidal modeling , 1997 .

[2]  Robert E. Yantorno,et al.  Comparison of two objective speech quality measures: MBSD and ITU-T Recommendation P.861 , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[3]  John H. L. Hansen,et al.  Speech enhancement using a constrained iterative sinusoidal model , 2001, IEEE Trans. Speech Audio Process..

[4]  Chunjian Li,et al.  Inter-frequency dependency in mmse speech enhancement , 2004, Proceedings of the 6th Nordic Signal Processing Symposium, 2004. NORSIG 2004..

[5]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[6]  David V. Anderson,et al.  Audio signal noise reduction using multi-resolution sinusoidal modeling , 1998, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[7]  Antony William Rix,et al.  Perceptual evaluation of speech quality (PESQ): The new ITU standard for end-to-end speech quality a , 2002 .

[8]  Michael S. Scordilis,et al.  Analysis, enhancement and evaluation of five pitch determination techniques , 2002, Speech Commun..

[9]  Yi Hu,et al.  Incorporating a psychoacoustical model in frequency domain speech enhancement , 2004, IEEE Signal Processing Letters.

[10]  Steven M. Kay,et al.  Cochannel speaker separation by harmonic enhancement and suppression , 1997, IEEE Trans. Speech Audio Process..

[11]  M. Gabrea,et al.  Adaptive Kalman filtering-based speech enhancement algorithm , 2001, Canadian Conference on Electrical and Computer Engineering 2001. Conference Proceedings (Cat. No.01TH8555).

[12]  J. S. Lim,et al.  Speech enhancement using the dual excitation speech model , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Peter Jax,et al.  A novel psychoacoustically motivated audio enhancement algorithm preserving background noise characteristics , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[14]  D.H. Johnson,et al.  The Signal Processing Information Base , 1993, IEEE Signal Processing Magazine.

[15]  Hsiao-Chuan Wang,et al.  New speech harmonic structure measure and it application to post speech enhancement , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Nam C. Phamdo,et al.  Signal/noise KLT based approach for enhancing speech degraded by colored noise , 2000, IEEE Trans. Speech Audio Process..

[17]  Thomas F. Quatieri,et al.  An approach to co-channel talker interference suppression using a sinusoidal model for speech , 1990, IEEE Trans. Acoust. Speech Signal Process..

[18]  Yariv Ephraim,et al.  A signal subspace approach for speech enhancement , 1995, IEEE Trans. Speech Audio Process..

[19]  Ning Ma,et al.  Speech enhancement using a masking threshold constrained Kalman filter and its heuristic implementations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[20]  Robert M. Gray,et al.  On the asymptotic eigenvalue distribution of Toeplitz matrices , 1972, IEEE Trans. Inf. Theory.

[21]  Kuldip K. Paliwal,et al.  A speech enhancement method based on Kalman filtering , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Zhu Xiaojing,et al.  Speech enhancement using harmonic regeneration , 2011, 2011 IEEE International Conference on Computer Science and Automation Engineering.

[23]  David Malah,et al.  Speech enhancement using a minimum mean-square error log-spectral amplitude estimator , 1984, IEEE Trans. Acoust. Speech Signal Process..

[24]  Andrew Sekey,et al.  An Objective Measure for Predicting Subjective Quality of Speech Coders , 1992, IEEE J. Sel. Areas Commun..

[25]  Olivier Cappé,et al.  Enhancement of speech based on non-parametric estimation of a time varying harmonic representation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[26]  Yannis Stylianou,et al.  Applying the harmonic plus noise model in concatenative speech synthesis , 2001, IEEE Trans. Speech Audio Process..

[27]  James D. Johnston,et al.  Transform coding of audio signals using perceptual noise criteria , 1988, IEEE J. Sel. Areas Commun..

[28]  Yi Hu,et al.  Evaluation of objective measures for speech enhancement , 2006, INTERSPEECH.

[29]  J. Markel,et al.  The SIFT algorithm for fundamental frequency estimation , 1972 .

[30]  Allen Gersho,et al.  Adaptive postfiltering for quality enhancement of coded speech , 1995, IEEE Trans. Speech Audio Process..

[31]  Andreas Spanias,et al.  HMM-based speech enhancement using harmonic modeling , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[32]  Robert E. Yantorno,et al.  Performance of the modified Bark spectral distortion as an objective speech quality measure , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[33]  W. Bastiaan Kleijn,et al.  Generalized Postfilter for Speech Quality Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[34]  Yi Hu,et al.  Evaluation of Objective Quality Measures for Speech Enhancement , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Y. Ephraim,et al.  Extension of the signal subspace speech enhancement approach to colored noise , 2003, IEEE Signal Processing Letters.

[36]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[37]  Yariv Ephraim,et al.  A minimum mean square error approach for speech enhancement , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[38]  Mitch Weintraub,et al.  Estimation of noise-corrupted speech DFT-spectrum using the pitch period , 1994, IEEE Trans. Speech Audio Process..