Perceptual Audio Coding with Adaptive Non-uniform Time/frequency Tilings Using Subband Merging and Time Domain Aliasing Reduction

In this paper, we investigate the coding efficiency of perceptual coding using an adaptive non-uniform orthogonal filter-bank based on MDCT analysis/synthesis and time domain aliasing reduction. We compare its performance to a system using a traditional adaptive uniform MDCT filterbank with window switching. The comparison is performed using a listening test at two different quantization settings. The statistical evaluation shows that the percetpual quality of the nonuniform filterbank significantly out-performs that of the uniform filterbank by 5 to 10 MUSHRA points.

[1]  Jörn Ostermann,et al.  Combination of Different Perceptual Models with Different Audio Transform Coding Schemes:Implementation and Evaluation , 2010 .

[2]  P. Noll,et al.  A new orthonormal wavelet packet decomposition for audio coding using frequency-varying modulated lapped transforms , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[3]  Bernd Edler Codierung von Audiosignalen mit überlappender Transformation und adaptiven Fensterfunktionen , 1989 .

[4]  Bernd Edler,et al.  Nonuniform Orthogonal Filterbanks Based on MDCT Analysis/Synthesis and Time-Domain Aliasing Reduction , 2017, IEEE Signal Processing Letters.

[5]  Sascha Dick,et al.  Efficient Multichannel Audio Transform Coding with Low Delay and Complexity , 2016 .

[6]  Kenneth Rose,et al.  Trellis-Based Approaches to Rate-Distortion Optimized Audio Encoding , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Louis Dunn Fielder,et al.  ISO/IEC MPEG-2 Advanced Audio Coding , 1997 .

[8]  Thibaud Necciari,et al.  A quasi-orthogonal, invertible, and perceptually relevant time-frequency transform for audio coding , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[9]  B. Moore,et al.  Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.

[10]  J. D. Johnston,et al.  Estimation of perceptual entropy using noise masking criteria , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[11]  Armin Taghipour,et al.  A psychoacoustic model with Partial Spectral Flatness Measure for tonality estimation , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[12]  R. Heusdens,et al.  Flexible frequency decompositions for cosine-modulated filter banks , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13]  Paul A. Viola,et al.  Online decoding of Markov models under latency constraints , 2006, ICML.

[14]  Touradj Ebrahimi,et al.  The MPEG-4 Book , 2002 .

[15]  John Princen,et al.  Subband/Transform coding using filter bank designs based on time domain aliasing cancellation , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.