Harmonic coding of speech at low bit rates

Activity in research relating to the compression of digital speech signals has increased markedly in recent years due in part to rising consumer demand for products such as digital cellular telephones, personal communications systems, and multimedia systems. The dominant structure for speech codes at rates above 4 kb/s is Code Excited Linear Prediction (CELP) in which the speech waveform is reproduced as closely as possible. Recently, however, harmonic coding has become increasingly prevalent at rates of 4 kb/s and below. Harmonic coders use a parametric model in an attempt to reproduce the perceptual quality of the speech signal without directly encoding the waveform details. In this thesis, we address some of the challenges of harmonic coding through the development of a new speech codec called Spectral Excitation Coding (SEC). SEC is a harmonic coder which uses a sinusoidal model applied to the excitation signal rather than to the speech signal directly. The same model is used to process both voiced and unvoiced speech through the use of an adaptive algorithm for phase dispersion. Informal listening test results are presented which indicate that the quality of SEC operating at 2.4 kb/s is close to that of existing standard codecs operating at over 4 kb/s. The SEC system incorporates a new technique for vector quantization of the variable dimension harmonic magnitude vector called Non-Square Transform Vector Quantization (NSTVQ). NSTVQ addresses the problem of variable-dimension vector quantization by combining a fixed-dimension vector quantizer with a set of variable-sized non-square transforms. We discuss the factors which influence the choice of transform in NSTVQ, as well as several algorithm features including single-parameter control over the tradeoff between complexity and distortion, simpler uses of vector prediction techniques, inherent embedded coding. Experimental results show that NSTVQ out-performs several existing techniques in terms of providing lower distortion along with lower complexity and storage requirements. Results are presented which indicate that NSTVQ used in the Improved Multiband Excitation (IMBE) environment could achieve equivalent spectral distortion while reducing the overall rate by 1000-1250 bits per second.

[1]  S. A. Mahmoud,et al.  Tree searched multi-stage vector quantization of LPC parameters for 4 kb/s speech coding , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Thomas F. Quatieri,et al.  Speech transformations based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[3]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[4]  Peter Strobach,et al.  Linear Prediction Theory: A Mathematical Basis for Adaptive Systems , 1990 .

[5]  Vladimir Cuperman,et al.  Vector quantization of harmonic magnitudes for low-rate speech coders , 1994, 1994 IEEE GLOBECOM. Communications: The Global Bridge.

[6]  Thomas F. Quatieri,et al.  Sine-wave phase coding at low data rates , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Allen Gersho,et al.  Advances in speech and audio compression , 1994, Proc. IEEE.

[8]  Chuan Yi Tang,et al.  A 2.|E|-Bit Distributed Algorithm for the Directed Euler Trail Problem , 1993, Inf. Process. Lett..

[9]  Allen Gersho,et al.  An overview of variable rate speech coding for cellular networks , 1992, 1992 IEEE International Conference on Selected Topics in Wireless Communications.

[10]  Kuldip K. Paliwal,et al.  Speech Coding and Synthesis , 1995 .

[11]  Thomas F. Quatieri,et al.  An approach to co-channel talker interference suppression using a sinusoidal model for speech , 1990, IEEE Trans. Acoust. Speech Signal Process..

[12]  S. P. Lloyd,et al.  Least squares quantization in PCM , 1982, IEEE Trans. Inf. Theory.

[13]  I. A. Gerson,et al.  Vector sum excited linear prediction (VSELP) speech coding at 8 kbps , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[14]  R. Bracewell The fast Hartley transform , 1984, Proceedings of the IEEE.

[15]  Vladimir Cuperman,et al.  A 2.4 kbit/s CELP speech codec with class-dependent structure , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  N. Meyers,et al.  H = W. , 1964, Proceedings of the National Academy of Sciences of the United States of America.

[17]  Vladimir Cuperman,et al.  Non-Square Transform Vector Quantization for Low-rate Speech Coding , 1995, Proceedings. IEEE Workshop on Speech Coding for Telecommunications.

[18]  Vladimir Cuperman,et al.  Spectral excitation coding of speech at 2.4 kb/s , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[19]  Allen Gersho,et al.  Speech and Audio Coding for Wireless and Network Applications , 1993 .

[20]  Yen-Chun Lin,et al.  A Low-Delay CELP Coder for the CCITT 16 kb/s Speech Coding Standard , 1992, IEEE J. Sel. Areas Commun..

[21]  Ahmet M. Kondoz,et al.  High quality multiband LPC coding of speech at 2.4 kbit/s , 1991 .

[22]  D. Paul The spectral envelope estimation vocoder , 1981 .

[23]  R. Hartley A More Symmetrical Fourier Analysis Applied to Transmission Problems , 1942, Proceedings of the IRE.

[24]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Jun Matsumoto,et al.  Harmonic and noise coding of LPC residuals with classified vector quantization , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[26]  Richard V. Cox,et al.  Spectral quantization and interpolation for CELP coders , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[27]  Allen Gersho,et al.  Enhanced multiband excitation coding of speech at 2.4 Kb/s with discrete all-pole spectral modeling , 1994, 1994 IEEE GLOBECOM. Communications: The Global Bridge.

[28]  Jae S. Lim,et al.  Multiband excitation vocoder , 1988, IEEE Transactions on Acoustics, Speech, and Signal Processing.

[29]  Yair Shoham High-quality speech coding at 2.4 kbps based on time-frequency interpolation , 1993, EUROSPEECH.

[30]  Vladimir Cuperman,et al.  Nonsquare transform vector quantization , 1996, IEEE Signal Processing Letters.

[31]  Vladimir Cuperman,et al.  A multi-mode variable rate CELP coder based on frame classification , 1993, Proceedings of ICC '93 - IEEE International Conference on Communications.

[32]  John E. Markel,et al.  Linear Prediction of Speech , 1976, Communication and Cybernetics.

[33]  J. Makhoul,et al.  Vector quantization in speech coding , 1985, Proceedings of the IEEE.

[34]  Michael Shapiro Brandstein A 1.5 Kbps multi-band excitation speech coder , 1990 .

[35]  Thomas F. Quatieri,et al.  Speech analysis/Synthesis based on a sinusoidal representation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[36]  Allen Gersho,et al.  Variable bit-rate CELP coding of speech with phonetic classification , 1994, Eur. Trans. Telecommun..

[37]  Harry L. Van Trees,et al.  Detection, Estimation, and Modulation Theory, Part I , 1968 .

[38]  William R. Gardner,et al.  QCELP: A Variable Rate Speech Coder for CDMA Digital Cellular , 1993 .

[39]  Allen Gersho,et al.  Variable Dimension Vector Quantization of Speech Spectra for Low Rate Vocoders , 1994, Data Compression Conference.

[40]  Luís B. Almeida,et al.  Variable-frequency synthesis: An improved harmonic coding scheme , 1984, ICASSP.

[41]  P. Yip,et al.  Discrete Cosine Transform: Algorithms, Advantages, Applications , 1990 .

[42]  M. Sabin,et al.  Sine-Wave Amplitude Coding at Low Data Rates , 1991 .

[43]  Ahmet M. Kondoz,et al.  Digital Speech: Coding for Low Bit Rate Communication Systems , 1995 .

[44]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[45]  Vladimir Cuperman On adaptive vector transform quantization for speech coding , 1989, IEEE Trans. Commun..