Artificial bandwidth extension of spectral envelope with temporal clustering

We present a new wideband spectral envelope estimation framework for the artificial bandwidth extension problem. The proposed framework builds temporal clusters of the joint sub-phone patterns of the narrowband and wideband speech signals using a parallel branch HMM structure. The joint sub-phone patterns define temporally correlated neighborhoods, in which a linear prediction filter estimates spectral features of the corresponding wideband signal from the narrowband signal. The proposed framework is compared to a benchmark vector quantization based artificial bandwidth extension algorithm. Performance evaluations are performed with three distinct objective metrics and a subjective A/B test.

[1]  Willem Bastiaan Kleijn,et al.  Bandwidth expansion of speech based on vector quantization of the mel frequency cepstral coefficients , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[2]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[3]  Douglas D. O'Shaughnessy,et al.  Statistical recovery of wideband speech from narrowband speech , 1992, IEEE Trans. Speech Audio Process..

[4]  A. Murat Tekalp,et al.  Analysis of Head Gesture and Prosody Patterns for Prosody-Driven Head-Gesture Animation , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Peter Jax,et al.  On artificial bandwidth extension of telephone speech , 2003, Signal Process..

[6]  Engin Erzin,et al.  Improving Throat Microphone Speech Recognition by Joint Analysis of Throat and Acoustic Microphone Recordings , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Hyung Soon Kim,et al.  Narrowband to wideband conversion of speech using GMM based transformation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).