Audio bandwidth extension based on temporal smoothing cepstral coefficients

In this paper, we propose a wideband (WB) to super-wideband audio bandwidth extension (BWE) method based on temporal smoothing cepstral coefficients (TSCC). A temporal relationship of audio signals is included into feature extraction in the bandwidth extension frontend to make the temporal evolution of the extended spectra smoother. In the bandwidth extension scheme, a Gammatone auditory filter bank is used to decompose the audio signal, and the energy of each frequency band is long-term smoothed using minima controlled recursive averaging (MCRA) in order to suppress transient components. The resulting ‘steady-state’ spectrum is processed by frequency weighting, and the temporal smoothing cepstral coefficients are obtained by means of the power-law loudness function and cepstral normalization. The extracted temporal smoothing cepstral coefficients are fed into a Gaussian mixture model (GMM)-based Bayesian estimator to estimate the high-frequency (HF) spectral envelope, while the fine structure is restored by spectral translation. Evaluation results show that the temporal smoothing cepstral coefficients exploit the temporal relationship of audio signals and provide higher mutual information between the low- and high-frequency parameters, without increasing the dimension of input vectors in the frontend of bandwidth extension systems. In addition, the proposed bandwidth extension method is applied into the G.729.1 wideband codec and outperforms the Mel frequency cepstral coefficient (MFCC)-based method in terms of log spectral distortion (LSD), cosh measure, and differential log spectral distortion. Further, the proposed method improves the smoothness of the reconstructed spectrum over time and also gains a good performance in the subjective listening tests.

[1]  Peter Jax,et al.  An upper bound on the quality of artificial bandwidth extension of narrowband speech signals , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Thomas Eriksson,et al.  Time evolution in LPC spectrum coding , 2004, IEEE Transactions on Speech and Audio Processing.

[3]  W. Bastiaan Kleijn,et al.  On the mutual information between frequency bands in speech , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[4]  W. Bastiaan Kleijn,et al.  Gaussian mixture model based mutual information estimation between frequency bands in speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Hyung Soon Kim,et al.  Narrowband to wideband conversion of speech using GMM based transformation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  B. Moore,et al.  Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.

[7]  Paavo Alku,et al.  Bandwidth Extension of Telephone Speech Using a Neural Network and a Filter Bank Implementation for Highband Mel Spectrum , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Paavo Alku,et al.  Evaluation of an Artificial Speech Bandwidth Extension Method in Three Languages , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  A. Gray,et al.  Distance measures for speech processing , 1976 .

[10]  Peter Kabal,et al.  WIDEBAND SPEECH RECOVERY FROM NARROWBAND SPEECH USING CLASSIFIED CODEBOOK MAPPING , 2002 .

[11]  Peter Kabal,et al.  Memory-Based Approximation of the Gaussian Mixture Model Framework for Bandwidth Extension of Narrowband Speech , 2011, INTERSPEECH.

[12]  Bernd Geiser Beyond Wideband Telephony — Bandwidth Extension for Super-Wideband Speech , 2008 .

[13]  Thomas Eriksson,et al.  A speech spectrum distortion measure with interframe memory , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[14]  John Makhoul,et al.  High-frequency regeneration in speech coding systems , 1979, ICASSP.

[15]  Yang Gao,et al.  ITU-T G.729.1: AN 8-32 Kbit/S Scalable Coder Interoperable with G.729 for Wideband Telephony and Voice Over IP , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[16]  Peter Jax,et al.  Feature selection for improved bandwidth extension of speech signals , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Paavo Alku,et al.  Speech bandwidth extension using Gaussian mixture model-based estimation of the highband mel spectrum , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Ronaldus Maria Aarts,et al.  Audio Bandwidth Extension: Application of Psychoacoustics, Signal Processing and Loudspeaker Design , 2004 .

[19]  Engin Erzin,et al.  Artificial bandwidth extension of spectral envelope along a Viterbi path , 2013, Speech Commun..

[20]  Ieee Transactions On,et al.  The Residual-Excited Linear Prediction Vocoder , 1975 .

[21]  DeLiang Wang,et al.  Robust speaker identification using auditory features and computational auditory scene analysis , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Peter Kabal,et al.  Mel-frequency cepstral coefficient-based bandwidth extension of narrowband speech , 2008, INTERSPEECH.

[23]  Thomas Esch,et al.  An information theoretic view on Artificial Bandwidth Extension in noisy environments , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  I. Cohen,et al.  Noise estimation by minima controlled recursive averaging for robust speech enhancement , 2002, IEEE Signal Processing Letters.

[25]  Seymour Shlien,et al.  The modulated lapped transform, its time-varying forms, and its applications to audio coding standards , 1997, IEEE Trans. Speech Audio Process..

[26]  Peter Kabal,et al.  The Effect of Memory Inclusion on Mutual Information Between Speech Frequency Bands , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[27]  Peter Vary,et al.  Digital Speech Transmission: Enhancement, Coding and Error Concealment , 2006 .

[28]  Gerhard Doblinger,et al.  Computationally efficient speech enhancement by spectral minima tracking in subbands , 1995, EUROSPEECH.

[29]  Anssi Rämö,et al.  Scalable superwideband extension for wideband coding , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  W. Bastiaan Kleijn,et al.  Spectral dynamics is more important than spectral distortion , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[31]  Geun-Bae Song,et al.  A study of HMM-based bandwidth extension of speech signals , 2009, Signal Process..

[32]  Peter Kabal,et al.  Combining frontend-based memory with MFCC features for Bandwidth Extension of narrowband speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[33]  Peter Jax,et al.  Wideband extension of telephone speech using a hidden Markov model , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).