Frequency domain coding of speech

Frequency domain techniques for speech coding have recently received considerable attention. The basic concept of these methods is to divide the speech into frequency components by a filter bank (sub-band coding), or by a suitable transform (transform coding), and then encode them using adaptive PCM. Three basic factors are involved in the design of these coders: 1) the type of the filter bank or transform, 2) the choice of bit allocation and noise shaping properties involved in bit allocation, and 3) the control of the step-size of the encoders. This paper reviews the basic aspects of the design of these three factors for sub-band and transform coders. Concepts of short-time analysis/synthesis are first discussed and used to establish a basic theoretical framework. It is then shown how practical realizations of subband and transform coding are interpreted within this framework. Principles of spectral estimation and models of speech production and perception are then discussed and used to illustrate how the "side information" can be most efficiently represented and utilized in the design of the coder (particularly the adaptive transform coder) to control the dynamic bit allocation and quantizer step-sizes. Recent developments and examples of the "vocoder-driven" adaptive transform coder for low bit-rate applications are then presented.

[1]  Ronald W. Schafer,et al.  Digital Processing of Speech Signals , 1978 .

[2]  D. Esteban,et al.  Application of quadrature mirror filters to split band voice coding schemes , 1977 .

[3]  Joel Max,et al.  Quantizing for minimum distortion , 1960, IRE Trans. Inf. Theory.

[4]  P. Schultheiss,et al.  Block Quantization of Correlated Gaussian Random Variables , 1963 .

[5]  Max V. Mathews,et al.  A linear coding for transmitting a set of correlated signals , 1956, IRE Trans. Inf. Theory.

[6]  Ronald E. Crochiere,et al.  A study of complexity and quality of speech waveform coders , 1978, ICASSP.

[7]  T.H. Crystal,et al.  Linear prediction of speech , 1977, Proceedings of the IEEE.

[8]  Ronald E. Crochiere A novel approach for implementing pitch prediction in sub-band coding , 1979, ICASSP.

[9]  Ronald W. Schafer,et al.  Design and simulation of a speech analysis-synthesis system based on short-time Fourier analysis , 1973 .

[10]  C. Scagliola,et al.  Objective and subjective performance of tandem connections of waveform coders with an LPC vocoder , 1979, The Bell System Technical Journal.

[11]  M. Portnoff,et al.  Time-scale modification of speech based on short-time Fourier analysis , 1981 .

[12]  R. Crochiere On the design of sub‐band coders for low bit rate speech communications , 1976 .

[13]  H. Gethoffer Polar plane blockquantization of speech signals using bit-pattern matching techniques , 1977 .

[14]  Wen-Hsiung Chen,et al.  A Fast Computational Algorithm for the Discrete Cosine Transform , 1977, IEEE Trans. Commun..

[15]  J. Flanagan Speech Analysis, Synthesis and Perception , 1971 .

[16]  B. Atal,et al.  Predictive coding of speech signals and subjective error criteria , 1979 .

[17]  M. J. Narasimha,et al.  On the Computation of the Discrete Cosine Transform , 1978, IEEE Trans. Commun..

[18]  James L. Flanagan,et al.  Adaptive quantization in differential PCM coding of speech , 1973 .

[19]  Alan V. Oppenheim,et al.  Discrete representation of signals , 1972 .

[20]  Ronald E. Crochiere,et al.  Sub-band coder design incorporating quadrature filters and pitch prediction , 1979, ICASSP.

[21]  J. Makhoul,et al.  High quality adaptive predictive coding of speech , 1978, ICASSP.

[22]  P. Noll,et al.  Approaches to adaptive transform speech coding at low bit rates , 1979 .

[23]  James L. Flanagan,et al.  Digital coding of speech in sub-bands , 1976, The Bell System Technical Journal.

[24]  David J. Goodman,et al.  A Robust Adaptive Quantizer , 1975, IEEE Trans. Commun..

[25]  R. E. Crochiere On the design of sub-band coders for low-bit-rate speech communication , 1977, The Bell System Technical Journal.

[26]  P. Wintz Transform picture coding , 1972 .

[27]  P. Noll,et al.  A comparison of the performance of four low-bit-rate speech waveform coders , 1979, The Bell System Technical Journal.

[28]  J. Makhoul,et al.  Quantization properties of transmission parameters in linear predictive systems , 1975 .

[29]  B. Liu,et al.  Implementation of the Digital Phase Vocoder Using the Fast Fourier Transform , 2022 .

[30]  J.B. Allen,et al.  A unified approach to short-time Fourier analysis and synthesis , 1977, Proceedings of the IEEE.

[31]  N. Jayant Digital coding of speech waveforms: PCM, DPCM, and DM quantizers , 1974 .

[32]  M. Portnoff,et al.  Implementation of the digital phase vocoder using the fast Fourier transform , 1976 .

[33]  K. R. Rao,et al.  Orthogonal Transforms for Digital Signal Processing , 1979, IEEE Transactions on Systems, Man, and Cybernetics.

[34]  P. Noll,et al.  Adaptive transform coding of speech signals , 1977 .

[35]  L. Davisson Rate-distortion theory and application , 1972 .

[36]  R. E. Crochiere A mid-rise/mid-tread quantizer switch for improved idle-channel performance in adaptive coders , 1978, The Bell System Technical Journal.

[37]  Jont B. Allen,et al.  Short term spectral analysis, synthesis, and modification by discrete Fourier transform , 1977 .