Phoneme-Dependent Speech Enhancement

The majority of current speech enhancement systems are based on generalized signal-to-noise ratio dependent weighting rules and do not take into account the characteristics of the actual speech sound being processed. The following contribution is concerned with phoneme-specific speech enhancement methods that apply specially tailored signal processing methods. The first signal processing algorithm proposed in this work – fricative spreading – enhances high frequency unvoiced sounds for bandlimited speech transmission. The spreading algorithm detects different fricatives using a vector quantization codebook and then a suitable spectral compression function is applied to map high frequency energy from above the transmission bandwidth threshold into lower frequency regions still within the transmission bandwidth. A second approach – formant boosting – provides enhancement for voiced speech. Utilizing the codebook classification from fricative spreading, voiced speech phonemes are identified and accentuated by boosting formant regions and attenuating in between the formant frequencies.

[1]  John H. L. Hansen,et al.  Text-directed speech enhancement employing phone class parsing and feature map constrained vector quantization , 1997, Speech Commun..

[2]  Patrick Bauer,et al.  A statistical framework for artificial bandwidth extension exploiting speech waveform and phonetic transcription , 2009, 2009 17th European Signal Processing Conference.

[3]  David A. Heide,et al.  Speech enhancement for bandlimited speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  H J McDermott,et al.  Improvements in speech perception with use of the AVR TranSonic frequency-transposing hearing aid. , 1999, Journal of speech, language, and hearing research : JSLHR.

[5]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[6]  Hugh J. McDermott,et al.  Improvements in speech perception with an experimental nonlinear frequency compression hearing device , 2005, International journal of audiology.

[7]  C. Xydeas,et al.  Frequency Compression of 7.6 kHz Speech Into 3.3 kHz Bandwidth , 1983, IEEE Trans. Commun..