论文信息 - Bit allocation in time and frequency domains for predictive coding of speech

Bit allocation in time and frequency domains for predictive coding of speech

Adaptive predictive coding with dynamic bit allocation is presented for speech encoding at low to medium bit rates (6.4 kbits/s to 16 kbits/s). In this system, a split-band predictive coding scheme and a bit allocation scheme are employed in order to remove the redundancies due to a periodic concentration of the prediction residual energy, as well as the nonuniform nature of the speech spectrum. Quantization bits are dynamically allocated, both over the subbands (in the frequency domain) and over the subintervals (in the time domain), in accordance with the distribution of the residual energies in the time-frequency domain. Optimum bit allocation is derived based on the mean square error criterion on the speech waveform. The SNR gain is presented as the sum of the spectral SNR gain G f , equivalent to the prediction gain, and the temporal SNR gain G t . Although G t is much smaller than G f , temporal bit allocation greatly improves the actual SNR performance of the APC system to more than the value expected from its SNR gain in the bit rate range of less than 2 bits/sample. A study on the segmental SNR performance for various coder designs shows that the coder design using three subbands, four subintervals, and a fourth-order predictor in each subband is most appropriate for speech encoding in the bit rate range of 6.4 kbits/s to 16 kbits/s. This system is evaluated in terms of the segmental SNR and subjective speech quality. The results show that the system results in a substantial improvement compared with the conventional full-band APC system in regard to SNR performance and predictor loop stability. It is also shown that this system can provide speech quality subjectively equivalent to 7 bit log-PCM at 16 kbits/s, and to 6 bit log-PCM at 9.6 kbits/s.

Masaaki Honda | F. Itakura

[1] J. O'Neal,et al. Entropy-Coded Adaptive Differential Pulse-Code Modulation (DPCM) for Speech , 1974, IEEE Trans. Commun..

[2] D. Esteban,et al. Application of quadrature mirror filters to split band voice coding schemes , 1977 .

[3] John Makhoul,et al. Adaptive noise spectral shaping and entropy coding in predictive coding of speech , 1979 .

[4] B. Atal,et al. Improved quantizer for adaptive predictive coding of speech signals at low bit rates , 1980, ICASSP.

[5] Ronald E. Crochiere. A novel approach for implementing pitch prediction in sub-band coding , 1979, ICASSP.

[6] B. Atal,et al. Predictive coding of speech signals and subjective error criteria , 1979 .

[7] Takao Kobayashi,et al. A hardware implementation of a new narrow to medium band speech coding , 1982, ICASSP.

[8] Ronald E. Crochiere,et al. Frequency domain coding of speech , 1979 .

[9] P. Schultheiss,et al. Block Quantization of Correlated Gaussian Random Variables , 1963 .

[10] P. Noll,et al. Adaptive transform coding of speech signals , 1977 .

[11] R. Crochiere,et al. Speech Coding , 1979, IEEE Transactions on Communications.

[12] Fumitada Itakura. Research on low bit rate speech coding at the Electrical Communication Laboratory, NTT , 1978 .

[13] M. R. Schroeder,et al. Adaptive predictive coding of speech signals , 1970, Bell Syst. Tech. J..

[14] Joel Max,et al. Quantizing for minimum distortion , 1960, IRE Trans. Inf. Theory.

[15] John Makhoul,et al. Stability analysis of APC systems , 1981, ICASSP.

[16] James L. Flanagan,et al. Digital coding of speech in sub-bands , 1976, The Bell System Technical Journal.