A statistical framework for artificial bandwidth extension exploiting speech waveform and phonetic transcription

In the past, artificial bandwidth extension (ABWE) has primarily been investigated to enhance transmitted narrowband speech signals at the receiving side. State-of-the-art schemes show improved quality versus narrowband speech; however, a clear gap to wideband speech is still reported. This is largely due to the insufficient ABWE performance on fricatives, particularly /s/. We asked ourselves to what extent the speech quality could be improved, if we knew the currently spoken phoneme. In this paper we present a framework using phonetic transcriptions as a-priori knowledge besides the speech waveform. Possible applications are high-quality offline ABWE of telephone, pilot, or historic speech recordings, memory efficient narrowband speech synthesis followed by ABWE, and extension of narrowband telephone databases to train wideband acoustic models for automatic speech recognition. For the classical conversational telephony application, an improved ABWE scheme is also proposed making use of transcription information only during training.

[1]  Cheung-Fat Chan,et al.  Block-based speech bandwidth extension system with seperated envelope energy ratio estimation , 2005, 2005 13th European Signal Processing Conference.

[2]  Alex Acero,et al.  Robust bandwidth extension of noise-corrupted narrowband speech , 2005, INTERSPEECH.

[3]  John H. L. Hansen,et al.  Text-directed speech enhancement employing phone class parsing and feature map constrained vector quantization , 1997, Speech Commun..

[4]  Patrick Bauer,et al.  An HMM-based artificial bandwidth extension evaluated by cross-language training and test , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Peter Jax,et al.  ROBUST WIDEBAND ENHANCEMENT OF SPEECH BY COMBINED CODING AND ARTIFICIAL BANDWIDTH EXTENSION , 2005 .

[6]  Khalid Choukri,et al.  SPEECHDAT-CAR. A Large Speech Database for Automotive Environments , 2000, LREC.

[7]  Arild Lacroix,et al.  Time-varying linear prediction for speech analysis and synthesis , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  W. Bastiaan Kleijn,et al.  Avoiding over-estimation in bandwidth extension of telephony speech , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[9]  Peter Jax,et al.  Wideband extension of telephone speech using a hidden Markov model , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[10]  Paavo Alku,et al.  Neural Network-Based Artificial Bandwidth Expansion of Speech , 2007, IEEE Transactions on Audio, Speech, and Language Processing.