A High-Quality Speech and Audio Codec With Less Than 10-ms Delay

With increasing quality requirements for multimedia communications, audio codecs must maintain both high quality and low delay. Typically, audio codecs offer either low delay or high quality, but rarely both. We propose a codec that simultaneously addresses both these requirements, with a delay of only 8.7 ms at 44.1 kHz. It uses gain-shape algebraic vector quantization in the frequency domain with time-domain pitch prediction. We demonstrate that the proposed codec operating at 48 kb/s and 64 kb/s out-performs both G.722.1C and MP3 and has quality comparable to AAC-LD, despite having less than one fourth of the algorithmic delay of these codecs.

[1]  S. Golomb Run-length encodings. , 1966 .

[2]  John Makhoul,et al.  High-frequency regeneration in speech coding systems , 1979, ICASSP.

[3]  RECOMMENDATION ITU-R BS.1534-1 - Method for the subjective assessment of intermediate quality level of coding systems , 2003 .

[4]  Yang Gao,et al.  ITU-T G.729.1: AN 8-32 Kbit/S Scalable Coder Interoperable with G.729 for Wideband Telephony and Voice Over IP , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Bin Yu,et al.  Perceptual audio coding using adaptive pre- and post-filters and lossless compression , 2002, IEEE Trans. Speech Audio Process..

[6]  G. Nigel Martin,et al.  * Range encoding: an algorithm for removing redundancy from a digitised message , 1979 .

[7]  Andries P. Hekstra,et al.  Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[8]  Methods for objective and subjective assessment of quality Perceptual evaluation of speech quality ( PESQ ) : An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs , 2002 .

[9]  Gerald Schuller,et al.  Network Music Performance (NMP) in Narrow Band Networks , 2006 .

[10]  Thomas R. Fischer,et al.  A pyramid vector quantizer , 1986, IEEE Trans. Inf. Theory.

[11]  U. Mittal,et al.  Wideband coding of speech using a scalable pulse codebook , 2000, 2000 IEEE Workshop on Speech Coding. Proceedings. Meeting the Challenges of the New Millennium (Cat. No.00EX421).

[12]  Roch Lefebvre,et al.  The adaptive multirate wideband speech codec (AMR-WB) , 2002, IEEE Trans. Speech Audio Process..

[13]  S. Morissette,et al.  On reducing computational complexity of codebook search in CELP coder through the use of algebraic codes , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[14]  Minjie Xie,et al.  From ITU-T G.722.1 to ITU-T G.722.1 Annex C: A New Low-Complexity 14kHz Bandwidth Audio Coding Standard , 2007, J. Multim..

[15]  G. Schuller,et al.  Packet loss concealment in predictive audio coding , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[16]  Fredrik Nordén,et al.  Companded quantization of speech MDCT coefficients , 2005, IEEE Transactions on Speech and Audio Processing.

[17]  Ralf Geiger,et al.  Structural Analysis of Low Latency Audio Coding Schemes , 2005 .

[18]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[19]  Juin-Hwey Chen Toll-quality 16 kb/s CELP speech coding with very low complexity , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[20]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Solomon W. Golomb,et al.  Run-length encodings (Corresp.) , 1966, IEEE Trans. Inf. Theory.

[22]  Gerald Schuller,et al.  Reduced Bit Rate Ultra Low Delay Audio Coding , 2006 .