An Audio Quantizer Based on a Time Domain Auditory Masking Model

A novel, non-uniform PCM audio quantizer is described, employing a time-domain Computational Auditory Masking Model (CAMM). The model utilizes the concept of Signal Dependent Compression to produce an internal representation of the input signal so that via the use of a decision device a time-varying threshold can be derived. Based on this model, the proposed quantizer evaluates masked/unmasked regions of the signal, so that by an iterative process, the desired variable bit allocation table can be generated to quantize audio samples. Preliminary results indicate good perceptual quality for an average rate of 6.7 bits/sample. 0. INTRODUCTION Optimal methods for uniform quantization of digital audio signals have been theoretically established for a number of years now and are successfully employed in most audio applications [1-4]. Nevertheless, such systems rely on simple psychoacoustic principles and cannot fully exploit the recent advances in auditory modeling. Here, a detailed, time-domain computational auditory masking model is employed [6,7] and extended for non-uniform quantization of audio signals. This work introduces a novel procedure, utilizing a set of thresholds for the quantization of the audio signal according to a variable bit allocation per sample. The proposed method employs many principles utilized in perceptual audio coders [12,13], but auditory modeling and processing is performed in the time domain. Apart from some bitrate gains with respect to uniform PCM quantizers, the proposed method introduces a useful tool which illustrates exact time regions of the audio signal with significant distortion due to any form of quantization. In addition, a potential structure for a nonuniform quantizer is proposed which allows low bit rate and latency coding of audio samples at sufficient audio quality, without need for any form of decoding for mapping to 16 bit PCM. The paper is organized as following: In section 1, the Computational Auditory Masking Model (CAMM) is briefly analyzed and its utilization for the quantization process is presented. In section 2 the quantization according to CAMM is described. Simulation results accompanied with perceptual evaluation is given in section 3. Finally, some conclusions are drawn in section 4. 1. THE TIME DOMAIN AUDITORY MASKING MODEL The work relies on the computational model proposed by Buchholz and Mourjopoulos [6,7]. This Computational Auditory Masking Model (CAMM) successfully emulates many aspects of the monaural signal processing of the auditory system and is based on the concept of Signal Dependent Compression (SDC) (see Figure 1).The SDC concept, assumes that the auditory system performs a compression of the input signal’s amplitude, whose compressive characteristics depend on the input signal’s evolution. Zarouchas et al. Time domain masking quantizer Page 2 of 7 Figure 1 Block diagram of the CAMM As can be seen, the input signal is passed through a preprocessing stage consisting of a gammatone bandpass filterbank, a full-wave rectifier and a low pass filter, a Signal Dependent Compression (SDC) module and a temporal integrator. The utilization of this CAMM as a non-uniform quantizer can be achieved in the following way:

[1]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[2]  Ernst Eberlein,et al.  Advanced Audio Measurement System Using Psychoacoustic Properties , 1992 .

[3]  K. H. Barratt Digital Coding of Waveforms , 1985 .

[4]  Robert C. Maher On the Nature of Granulation Noise in Uniform Quantization Systems , 1992 .

[5]  John Mourjopoulos,et al.  A computational auditory masking model based on signal-dependent compression. I. Model description and performance analysis , 2004 .

[6]  B. Moore An Introduction to the Psychology of Hearing , 1977 .

[7]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[8]  John Vanderkooy,et al.  Minimally Audible Noise Shaping , 1991 .

[9]  Jerry D. Gibson,et al.  Digital coding of waveforms: Principles and applications to speech and video , 1985, Proceedings of the IEEE.

[10]  Hisashi Kihara,et al.  digital audio signal processing , 1990 .

[11]  Mark Kahrs,et al.  Applications of digital signal processing to audio and acoustics , 1998 .

[12]  B. Mondal Perceptual quantization using JNLD thresholds , 2002, 2002 IEEE International Symposium on Circuits and Systems. Proceedings (Cat. No.02CH37353).

[13]  Marina Bosi,et al.  Introduction to Digital Audio Coding and Standards , 2004, J. Electronic Imaging.

[14]  Werner Verhelst,et al.  On psychoacoustic noise shaping for audio requantization , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[15]  Robert A. Wannamaker Psychoacoustically Optimal Noise Shaping , 1992 .

[16]  James David Johnston,et al.  Enhancing the Performance of Perceptual Audio Coders by Using Temporal Noise Shaping (TNS) , 1996 .

[17]  John Vanderkooy,et al.  Quantization and Dither: A Theoretical Survey , 1992 .

[18]  Hugo Fastl,et al.  Psychoacoustics Facts and Models. 2nd updated edition , 1999 .

[19]  G. Blelloch Introduction to Data Compression * , 2022 .

[20]  Michael A. Gerzon,et al.  Optimal Noise Shaping and Dither of Digital Signals , 1989 .