The Effect of Memory Inclusion on Mutual Information Between Speech Frequency Bands

In this paper, we investigate the effect of temporal correlation on the dependence between the speech narrow and high frequency bands covering the 0.3-3.4 kHz and 3.7-8 kHz ranges, respectively. We follow the technique of using Gaussian mixture modelling of spectral envelopes represented by Mel-frequency cepstral coefficients. The correlation between the disjoint speech frequency bands is quantified through mutual information (MI) and its ratio to highband entropy. Speech exhibits considerable temporal correlation that is not explicitly accounted for by static parametrization of spectral envelopes. Including memory in speech parametrization (through delta features) incorporates such temporal information of speech in its modelling, and hence, MI gains are to be expected resulting in bandwidth extension with better performance. Results show that exploiting delta features can increase certainty about the highband (ratio of MI to highband entropy) by as much as 216% relatively, corresponding to an absolute increase of 12%

[1]  Roar Hagen,et al.  Spectral quantization of cepstral coefficients , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Peter Jax,et al.  Artificial bandwidth extension of speech signals using MMSE estimation based on a hidden Markov model , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  W. Bastiaan Kleijn,et al.  Gaussian mixture model based mutual information estimation between frequency bands in speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Peter Jax,et al.  Feature selection for improved bandwidth extension of speech signals , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  W. Bastiaan Kleijn,et al.  On the mutual information between frequency bands in speech , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[6]  Kuldip K. Paliwal,et al.  Efficient vector quantization of LPC parameters at 24 bits/frame , 1993, IEEE Trans. Speech Audio Process..

[7]  Peter Jax,et al.  An upper bound on the quality of artificial bandwidth extension of narrowband speech signals , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.