A quasi-orthogonal, invertible, and perceptually relevant time-frequency transform for audio coding

We describe ERB-MDCT, an invertible real-valued time-frequency transform based on MDCT, which is widely used in audio coding (e.g. MP3 and AAC). ERB-MDCT was designed similarly to ERBLet, a recent invertible transform with a resolution evolving across frequency to match the perceptual ERB frequency scale, while the frequency scale in most invertible transforms (e.g. MDCT) is uniform. ERB-MDCT has mostly the same frequency scale as ERBLet, but the main improvement is that atoms are quasi-orthogonal, i.e. its redundancy is close to 1. Furthermore, the energy is more sparse in the time-frequency plane. Thus, it is more suitable for audio coding than ERBLet.

[1]  Gaël Richard,et al.  Union of MDCT Bases for Audio Coding , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Karlheinz Gröchenig,et al.  Foundations of Time-Frequency Analysis , 2000, Applied and numerical harmonic analysis.

[3]  Kannan Ramchandran,et al.  Tilings of the time-frequency plane: construction of arbitrary orthogonal bases and fast tiling algorithms , 1993, IEEE Trans. Signal Process..

[4]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[5]  Thibaud Necciari,et al.  The ERBlet transform: An auditory-based time-frequency representation with perfect reconstruction , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Mark B. Sandler,et al.  MDCT analysis of sinusoids: exact results and applications to coding artifacts reduction , 2004, IEEE Transactions on Speech and Audio Processing.

[7]  Nicki Holighaus,et al.  Theory, implementation and applications of nonstationary Gabor frames , 2011, J. Comput. Appl. Math..

[8]  John Princen,et al.  Subband/Transform coding using filter bank designs based on time domain aliasing cancellation , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Ted Painter,et al.  Audio Signal Processing and Coding , 2007 .