Improved low-delay MDCT-based coding of both stationary and transient audio signals

General-purpose MDCT-based audio coders like MP3 or HE-AAC utilize long inter-transform overlap and lookahead-based transform length switching to provide good coding quality for both stationary and non-stationary, i. e. transient, input signals even at low bitrates. In low-delay communication scenarios such as Voice over IP, however, algorithmic delay due to framing and overlap typically needs to be reduced and additional lookahead must be avoided. We show that these restrictions limit the performance of contemporary low-delay transform coders on either stationary or transient material and propose 3 modifications: an improved noise substitution technique and increased overlap between “long”transforms for stationary, and “long to short” transform length switching without lookahead and directly from the long overlap for transient frames. A listening test indicates the merit of these changes when integrated into AAC-LD.

[1]  Ralf Geiger,et al.  MPEG-4 Enhanced Low Delay AAC - A New Standard for High Quality Communication , 2008 .

[2]  Pierrick Philippe,et al.  Adaptive time-frequency resolution in modulated transform at reduced delay , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  B. Moore An Introduction to the Psychology of Hearing , 1977 .

[4]  Eric Allamanche,et al.  MPEG-4 Low Delay Audio Coding Based on the AAC Codec , 1999 .

[5]  Timothy B. Terriberry,et al.  Definition of the Opus Audio Codec , 2012, RFC.

[6]  J. D. Johnston,et al.  Continuously signal-adaptive filterbank for high-quality perceptual audio coding , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[7]  Philippe Gournay,et al.  A Novel Scheme for Low Bitrate Unified Speech and Audio Coding – MPEG RM0 , 2009 .

[8]  RECOMMENDATION ITU-R BS.1534-1 - Method for the subjective assessment of intermediate quality level of coding systems , 2003 .

[9]  Elizabeth A. Strickland,et al.  An Introduction to the Psychology of Hearing (6th edition) , 2014 .

[10]  Jon Gibbs,et al.  G.718: A new embedded speech and audio coding standard with high resilience to error-prone transmission channels , 2009, IEEE Communications Magazine.

[11]  John Princen,et al.  Subband/Transform coding using filter bank designs based on time domain aliasing cancellation , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Timothy B. Terriberry,et al.  High-Quality, Low-Delay Music Coding in the Opus Codec , 2016, ArXiv.

[13]  Roch Lefebvre,et al.  Extended AMR-WB for high-quality audio on mobile devices , 2006, IEEE Communications Magazine.