Variable-dimension quantization of sinusoidal amplitudes using Gaussian mixture models

In this paper, Gaussian mixture (GM) models are used to design variable-dimension quantizers according to a weighted distortion criterion. A general method for combining a variable-to-fixed dimension transform, with GM modeling and quantization, is proposed. The method provides a convenient and efficient way to encode the amplitudes in a sinusoidal speech coder. Quantizers designed according to the proposed scheme are evaluated both according to weighted distortion criteria, and with respect to a high-rate bound approximation of the distortion. Informal listening tests suggest that the amplitudes can be encoded without subjective loss in a wideband harmonic coder, at a rate around 40 bits per frame (for the amplitudes only).

[1]  Vladimir Cuperman,et al.  Coding of variable dimension speech spectral vectors using weighted nonsquare transform vector quantization , 2001, IEEE Trans. Speech Audio Process..

[2]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[3]  Jan Skoglund,et al.  Vector quantization based on Gaussian mixture models , 2000, IEEE Trans. Speech Audio Process..

[4]  Bhaskar D. Rao,et al.  PDF optimized parametric vector quantization of speech line spectral frequencies , 2003, IEEE Trans. Speech Audio Process..

[5]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[6]  Allen Gersho,et al.  Variable Dimension Vector Quantization of Speech Spectra for Low Rate Vocoders , 1994, Data Compression Conference.

[7]  Jun Matsumoto,et al.  Vector quantized MBE with simplified V/UV division at 3.0 kbit/s , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Jonas Lindblom Coding Speech for Packet Networks , 2003 .