Mel-frequency cepstral coefficient-based bandwidth extension of narrowband speech

Abstract We present a novel MFCC-based scheme for the BandwidthExtension (BWE) of narrowband speech. BWE is based onthe assumption that narrowband speech (0.3–3.4 kHz) cor-relates closely with the highband signal (3.4–7 kHz), en-abling estimation of the highband frequency content given thenarrow band. While BWE schemes have traditionally usedLP-based parametrizations, our recent work has shown thatMFCC parametrization results in higher correlation betweenboth bands reaching twice that using LSFs. By employinghigh-resolution IDCT of highband MFCCs obtained from nar-rowband MFCCs by statistical estimation, we achieve high-quality highband power spectra from which the time-domainspeech signal can be reconstructed. Implementing this schemefor BWE translates the higher correlation advantage of MFCCsinto BWE performance superior to that obtained using LSFs,as shown by improvements in log-spectral distortion as well asItakura-based measures (the latter improving by up to 13%). Index Terms : Bandwidth extension, high-resolution IDCT,highband certainty, mutual information, source-filter model

[1]  Schuyler Quackenbush,et al.  Objective measures of speech quality , 1995 .

[2]  Shirley Dex,et al.  JR 旅客販売総合システム(マルス)における運用及び管理について , 1991 .

[3]  Peter Jax,et al.  On artificial bandwidth extension of telephone speech , 2003, Signal Process..

[4]  Meir Tzur,et al.  Speech reconstruction from mel frequency cepstral coefficients and pitch frequency , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5]  Peter Kabal,et al.  Combining equalization and estimation for bandwidth extension of narrowband speech , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  W. Bastiaan Kleijn,et al.  Gaussian mixture model based mutual information estimation between frequency bands in speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Xu Shao,et al.  Speech reconstruction from mel-frequency cepstral coefficients using a source-filter model , 2002, INTERSPEECH.

[8]  Peter Jax,et al.  Feature selection for improved bandwidth extension of speech signals , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Tenkasi Ramabadran,et al.  Enhancing distributed speech recognition with back- end speech reconstruction , 2001, INTERSPEECH.

[10]  Jae S. Lim,et al.  The unimportance of phase in speech enhancement , 1982 .

[11]  Peter Kabal,et al.  Objective analysis of the effect of memory inclusion on bandwidth extension of narrowband speech , 2007, INTERSPEECH.

[12]  R. Gray,et al.  Distortion measures for speech processing , 1980 .

[13]  Willem Bastiaan Kleijn,et al.  Bandwidth expansion of speech based on vector quantization of the mel frequency cepstral coefficients , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[14]  Peter Kabal,et al.  Dual-mode wideband speech recovery from narrowband speech , 2003, INTERSPEECH.