From Modeling to Perception - Topics in Speech Coding

Transmission and storage of speech require an efficient representation of the speech signal. The conversion of the acoustic speech waveform into a suitable digital representation is called speech coding. This thesis consists of six papers contributing to different research areas in speech coding. Speech coders relying on the source-filter model separate the speech signal into two sets of descriptive parameters, excitation and spectrum parameters. The procedure of representing the parameters with a finite reproduction set is called quantization. Efficient quantization can be obtained by employing vector quantization (VQ), where blocks of parameters are quantized simultaneously. The included papers utilize vector quantization for coding of the excitation and the spectrum parameters. Most of the time speech parameters exhibit high temporal correlation. Large gains can thus be obtained by exploiting this correlation in the coding process. Methods of this kind are referred to as memory based VQ. Four papers treat a new method of memory based VQ, the safety-net method, which is shown to outperform other memory based schemes for two different speech coding applications, excitation pulses and spectral parameters. The safety-net method is shown to yield high objective and subjective performance especially for transmission over noisy channels. The perceived speech quality is an important issue in coder designs. Perceptual aspects of memory based VQ for spectrum coding are investigated in two of the papers. A study of temporal masking in voiced speech is presented in one of the papers, and the results clearly indicate attainable quality improvements for coders. One paper presents a new general model-based analysis and design technique for VQ based on high rate VQ theory. The theory is applied to spectrum parameters and new performance bounds of spectrum quantization are presented.