Signal-adaptive transform kernel switching for stereo audio coding

Modern stereo and multi-channel perceptual audio codecs utilizing the modified discrete cosine transform (MDCT) can achieve very good overall coding quality even at low bit-rates but lack efficiency on some material with inter-channel phase difference (IPD) of about ±90 degrees. To address this issue a generalization of the lapped transform coding scheme is proposed which retains the perfect reconstruction property while allowing the usage of three further transform kernels, one of which is the modified discrete sine transform (MDST). Blind listening tests indicate that by frame-wise adaptation of each channel's transform kernel to the instantaneous IPD characteristics, notable gains in coding quality are possible with only negligible increase in decoder complexity and parameter rate.

[1]  Timothy B. Terriberry,et al.  Definition of the Opus Audio Codec , 2012, RFC.

[2]  J. D. Johnston,et al.  Sum-difference stereo transform coding , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Sascha Disch,et al.  Efficient transform coding of two-channel audio signals by means of complex-valued stereo prediction , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Andreas Niedermeier,et al.  Spectral envelope reconstruction via IGF for audio transform coding , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Raymond N. J. Veldhuis,et al.  Subband coding of stereophonic digital audio signals , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Louis Dunn Fielder,et al.  AC-2 and AC-3: Low-Complexity Transform-Based Audio Coding , 1996 .

[7]  John Princen,et al.  Analysis/Synthesis filter bank design based on time domain aliasing cancellation , 1986, IEEE Trans. Acoust. Speech Signal Process..

[8]  John Princen,et al.  Subband/Transform coding using filter bank designs based on time domain aliasing cancellation , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Andreas Niedermeier,et al.  Low-complexity semi-parametric joint-stereo audio transform coding , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[10]  Schuyler Quackenbush,et al.  The ISO/MPEG Unified Speech and Audio Coding Standard—Consistent High Quality for All Content Types and at All Bit Rates , 2013 .

[11]  Henrique S. Malvar Lapped transforms for efficient transform/subband coding , 1990, IEEE Trans. Acoust. Speech Signal Process..

[12]  Information technology — Coding of audio-visual objects — Part 3 : Audio Technologies de l ' information — Codage des objets audiovisuels — Partie , 1999 .

[13]  Henrique S. Malvar A modulated complex lapped transform and its applications to audio processing , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[14]  H. Fuchs Improving joint stereo audio coding by adaptive inter-channel prediction , 1993, Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[15]  Method for the subjective assessment of intermediate quality level of , 2014 .