Audio compression at low bit rates using a signal adaptive switched filterbank

A perceptual audio coder typically consists of a filter-bank which breaks the signal into its frequency components. These components are then quantized using a perceptual masking model. Previous efforts have indicated that a high resolution filter-bank, e.g., the modified discrete cosine transform (MDCT) with 1024 subbands, is able to minimize the bit rate requirements for most of the music samples. The high resolution MDCT, however, is not suitable for the encoding of non-stationary segments of music. A long/short resolution or "window" switching scheme has been employed to overcome this problem but it has certain inherent disadvantages which become prominent at lower bit rates (<64 kbps for stereo). We propose a novel switched filter-bank scheme which switches between a MDCT and a wavelet filter-bank based on the signal characteristics. A tree structured wavelet filter-bank with properly designed filters offers natural advantages for the representation of non-stationary segments such as attacks. Furthermore, it allows for the optimum exploitation of perceptual irrelevancies.

[1]  Deepen Sinha,et al.  Low bit rate transparent audio compression using adapted wavelets , 1993, IEEE Trans. Signal Process..

[2]  P. P. Vaidyanathan,et al.  Multirate digital filters, filter banks, polyphase networks, and applications: a tutorial , 1990, Proc. IEEE.

[3]  J. D. Johnston,et al.  Sum-difference stereo transform coding , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Marcel Breeuwer,et al.  Subband coding of digital audio signals without loss of quality , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[5]  J. Davenport Editor , 1960 .

[6]  Karlheinz Brandenburg,et al.  The iso/mpeg-audio codec: A generic standard for coding of high quality digital audio , 1992 .

[7]  John Princen,et al.  Audio coding with signal adaptive filterbanks , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.