Perceptual Audio Coding Using Sinusoidal/Optimum Wavelet Representation

AbstractA perceptual audio coder, in which each audio segment is adaptively analyzed using either a sinusoidal or an optimum wavelet basis according to the time-varying characteristics of the audio signals, has been constructed. The basis optimization is achieved by a novel switched filter bank scheme, which switches between a uniform filter bank structure (discrete cosine transform) and a non-uniform filter bank structure (discrete wavelet transform). A major artifact of the International ISO/Moving Pictures Experts Group (MPEG) audio coding standard (MPEG-I layers 1 and 2) known as pre-echo distortion which uses a uniform filter bank structure for audio signal analysis, is almost eliminated in the proposed coder. A perceptual masking model implemented using a high-resolution wavelet packet filter bank with 27 subbands, closely mimicking the critical bands of the human auditory system, is employed in this audio coder. The resulting scheme is a variable bit-rate audio coder, which provides compression ratios comparable to MPEG-I layers 1 and 2 with almost transparent quality.

[1]  Jelena Kovacevic,et al.  Wavelets and Subband Coding , 2013, Prentice Hall Signal Processing Series.

[2]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.

[3]  Hugo Fastl,et al.  Psychoacoustics: Facts and Models , 1990 .

[4]  Leah H. Jamieson,et al.  High-quality audio compression using an adaptive wavelet packet decomposition and psychoacoustic modeling , 1998, IEEE Trans. Signal Process..

[5]  S. Mallat A wavelet tour of signal processing , 1998 .

[6]  Deepen Sinha,et al.  Low bit rate transparent audio compression using adapted wavelets , 1993, IEEE Trans. Signal Process..