Low delay LPC and MDCT-based audio coding in the EVS codec

Speech coders operating in time domain can be extended with a frequency domain mode to improve encoding of music, even though this is challenging at low delay. In such a scenario, the short analysis window limits the benefit of the transform coder, while a delayless switch between the two coders constrains the system further. The paper presents an LPC and MDCT-based audio coder part of the new 3GPP codec for Enhanced Voice Services, which aims to solve the issues. Several advanced coding tools are introduced to alleviate the constraints: transient handling is improved, harmonic structures are better preserved, and the modeling of the zero-quantized frequencies is enhanced. Test results show that the obtained low-delay switched coder brings a clear improvement over a speech coder and is competitive even in comparison to audio coders with higher delay.

[1]  Timothy B. Terriberry,et al.  High-Quality, Low-Delay Music Coding in the Opus Codec , 2016, ArXiv.

[2]  Sascha Disch,et al.  Temporal Tile Shaping for spectral gap filling in audio transform coding in EVS , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Schuyler R. Quackenbush MPEG Unified Speech and Audio Coding , 2013, IEEE MultiMedia.

[4]  Bernd Edler,et al.  Improved low-delay MDCT-based coding of both stationary and transient audio signals , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Pasi Ojala,et al.  AMR-WB+: a new audio coding standard for 3rd generation mobile audio services , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Bernd Edler,et al.  Aliasing Reduction for Modified Discrete Cosine Transform Domain Filtering and its Application to Speech Enhancement , 2007, 2007 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[7]  Jürgen Herre,et al.  Enhanced Mpeg-4 Low Delay AAC - Low Bitrate High Quality Communication , 2007 .

[8]  Gary J. Sullivan,et al.  Efficient scalar quantization of exponential and Laplacian random variables , 1996, IEEE Trans. Inf. Theory.

[9]  Stéphane Ragot,et al.  Model-based deadzone optimization for stack-run audio coding with uniform scalar quantization , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  James David Johnston,et al.  Enhancing the Performance of Perceptual Audio Coders by Using Temporal Noise Shaping (TNS) , 1996 .

[11]  Guillaume Fuchs,et al.  Efficient context adaptive entropy coding for real-time applications , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Sascha Disch,et al.  MPEG Unified Speech and Audio Coding-The ISO/MPEG Standard for High-Efficiency Audio Coding of All C , 2012 .

[13]  Zhe Wang,et al.  Overview of the EVS codec architecture , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).