论文信息 - KLT-based adaptive entropy-constrained vector quantization for the speech signals

KLT-based adaptive entropy-constrained vector quantization for the speech signals

For efficient variable-rate speech coding, Karhunen-Loeve transform based adaptive entropy-constrained vector quantization (KLT-AECVQ) is proposed. The proposed method consists of backward-adaptive linear predictive coding (LPC) analysis, KLT estimation based on LPC coefficients, and lattice vector quantization followed by Huffman coding according to KLT statistics. As different statistics in an original-signal domain can be mapped into identical statistics in a KLT domain, only a few classified Huffman codebooks are sufficient to represent KLT-domain source statistics. KLT-AECVQ with 32 Huffman codebooks has comparable rate-distortion performance with theoretically optimal AECVQ with infinite number of Huffman codebooks. KLT-AECVQ also produces superior perceptual quality to KLT-based classified vector quantization (KLTCVQ) that yielded better quality than conventional code excited linear predictive (CELP) codec. Under five-sample delay constraints, KLT-AECVQ has also three times lower complexity than CELP codec.

Moo Young Kim | M. Kim

[1] Jonas Samuelsson,et al. Waveform quantization of speech using Gaussian mixture models , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2] W. Bastiaan Kleijn,et al. Reduction of the Impact of Distortion Outliers and Source Mismatch in Resolution-Constrained Quantization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[3] Xiaoyuan Gu,et al. NMP - a new networked music performance system , 2004, IEEE Global Telecommunications Conference Workshops, 2004. GlobeCom Workshops 2004..

[4] Robert M. Gray,et al. High-resolution quantization theory and the vector quantizer advantage , 1989, IEEE Trans. Inf. Theory.

[5] Methods for objective and subjective assessment of quality Perceptual evaluation of speech quality ( PESQ ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs , 2002 .

[6] David A. Huffman,et al. A method for the construction of minimum-redundancy codes , 1952, Proceedings of the IRE.

[7] R. Brennan,et al. A flexible filterbank structure for extensive signal manipulations in digital hearing aids , 1998, ISCAS '98. Proceedings of the 1998 IEEE International Symposium on Circuits and Systems (Cat. No.98CH36187).

[8] Jorma Rissanen,et al. Generalized Kraft Inequality and Arithmetic Coding , 1976, IBM J. Res. Dev..

[9] W.B. Kleijn,et al. Flexible Quantization of Audio and Speech based on the Autoregressive Model , 2007, 2007 Conference Record of the Forty-First Asilomar Conference on Signals, Systems and Computers.

[10] S. C. Kim,et al. A Conferencing System For Real-time, Multiparty, Multimedia Services. , 1998, International 1998 Conference on Consumer Electronics.

[11] David L. Neuhoff,et al. Quantization , 2022, IEEE Trans. Inf. Theory.

[12] N. J. A. Sloane,et al. Sphere Packings, Lattices and Groups , 1987, Grundlehren der mathematischen Wissenschaften.

[13] Takehiro Moriya. Technologies for speech and audio coding , 2009, 2009 IEEE 13th International Symposium on Consumer Electronics.

[14] Tamás Linder,et al. Mismatch in high-rate entropy-constrained vector quantization , 2003, IEEE Trans. Inf. Theory.

[15] W. Bastiaan Kleijn,et al. A Low-Delay Audio Coder with Constrained-Entropy Quantization , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[16] W. Bastiaan Kleijn,et al. KLT-based adaptive classified VQ of the speech signal , 2004, IEEE Transactions on Speech and Audio Processing.

[17] W. Bastiaan Kleijn,et al. Resolution-Constrained Quantization With JND-Based Perceptual-Distortion Measures , 2006, IEEE Signal Processing Letters.

[18] Yuan-Cheng Lai,et al. Perceptual codec and interaction aware playout algorithms and quality measurements for VoIP systems , 2004, IEEE Trans. Consumer Electron..

[19] Chi-keung Julian Wong. Coding of speech at 16 kbit/s using low-delay code excited linear prediction (LD-CELP) , 2010 .