Language-specific phonetic structure and the quantisation of the spectral envelope of speech

Abstract In the design of low-bit-rate (LBR) speech coding algorithms, language variability is often considered to be of secondary importance in comparison with other operational factors such as speaker variability and noise. Given that languages differ extensively in the composition of the spectral envelope and that the quantised spectral envelope of speech represents an important part of the bit allocation in speech coding, it is surprising to find that no comprehensive studies have ever been carried out on the role of language in spectral quantisation. This paper addresses this through a series of performance studies of spectral quantisation carried out across a set of language families typical of global mobile telecommunications. The study considers factors of quantiser design such as the size and structure of codebooks, and the quantity of monolingual data used in codebook training. This study found that quantisation distortion is not uniform across languages. It is shown that a significant difference exists in the behaviour of spectral quantisation across languages, in particular the behaviour of high distortion outliers. Detailed analysis of the spectral distortion data on a phonetic level revealed that the nature of the distribution of spectral energy in phonemes influenced the behaviour of monolingual codebooks. Some explanations for codebook performance are presented as well as a set of recommendations for codebook design for multi-lingual environments.

[1]  Allen Gersho,et al.  Natural quality variable-rate spectral speech coding below 3.0 kbps , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Nobuhiko Kitawaki,et al.  Subjective performance assessment of CCITT's 16 kbit/s speech coding algorithm , 1993, Speech Commun..

[3]  Ian Maddieson,et al.  Patterns of sounds , 1986 .

[4]  R. Montagna Selection phase of gsm half-rate channel , 1993, Proceedings., IEEE Workshop on Speech Coding for Telecommunications,.

[5]  G. S. Kang,et al.  Low-Bit Rate Speech Encoders Based on Line-Spectrum Frequencies (LSFs) , 1985 .

[6]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[7]  Frank K. Soong,et al.  Optimal quantization of LSP parameters (speech coding) , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[8]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[9]  Ronald A. Cole,et al.  The OGI multi-language telephone speech corpus , 1992, ICSLP.

[10]  A. Gray,et al.  Distortion performance of vector quantization for LPC voice coding , 1982 .

[11]  Jerry D. Gibson,et al.  Digital coding of waveforms: Principles and applications to speech and video , 1985, Proceedings of the IEEE.

[12]  Samy A. Mahmoud,et al.  Efficient search and design procedures for robust multi-stage VQ of LPC parameters for 4 kb/s speech coding , 1993, IEEE Trans. Speech Audio Process..

[13]  M. Ruhlen A guide to the languages of the world , 1977 .

[14]  David Atkinson,et al.  Objective assessment of 16 kbit/s LD-CELP speech quality , 1993, Speech Commun..

[15]  F. Itakura Line spectrum representation of linear predictor coefficients of speech signals , 1975 .

[16]  Yair Shoham Very low complexity interpolative speech coding at 1.2 to 2.4 kbps , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  R. Hagen,et al.  On memoryless quantization in speech coding , 1996, IEEE Signal Processing Letters.

[18]  A. Gersho,et al.  Multimode Spectral Coding of Speech at 2400 bps and Below , 1995, Proceedings. IEEE Workshop on Speech Coding for Telecommunications.

[19]  Allen Gersho,et al.  Asymptotically optimal block quantization , 1979, IEEE Trans. Inf. Theory.

[20]  S. Van Gerven,et al.  LSP quantization in wideband speech coders , 1999, 1999 IEEE Workshop on Speech Coding Proceedings. Model, Coders, and Error Criteria (Cat. No.99EX351).

[21]  S. A. Mahmoud,et al.  Tree searched multi-stage vector quantization of LPC parameters for 4 kb/s speech coding , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  R. J. Lickley,et al.  Proceedings of the International Conference on Spoken Language Processing. , 1992 .

[23]  Kuldip K. Paliwal,et al.  Speech Coding and Synthesis , 1995 .

[24]  Biing-Hwang Juang,et al.  Optimal quantization of LSP parameters , 1993, IEEE Trans. Speech Audio Process..

[25]  Roar Hagen,et al.  Low bit-rate spectral coding in CELP, a new LSP-method , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[26]  G. E. Peterson,et al.  Control Methods Used in a Study of the Vowels , 1951 .

[27]  Wendy J. Holmes Towards a unified model for low bit-rate speech coding using a recognition-synthesis approach , 1998, ICSLP.

[28]  Kuldip K. Paliwal,et al.  Efficient vector quantization of LPC parameters at 24 bits/frame , 1993, IEEE Trans. Speech Audio Process..

[29]  Juan Carlos De Martin,et al.  A 1.7 kb/s MELP coder with improved analysis and quantization , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).