A new psychoacoustical masking model for audio coding applications

The use of psychoacoustical masking models for audio coding applications has been wide spread over the past decades. In such applications, it is typically assumed that the original input signal serves as a masker for the distortions that are introduced by the lossy coding method that is used. Such masking models are based on the peripheral bandpass filtering properties of the auditory system and basically evaluate the distortion-to-masker ratio within each auditory filter. Up to now these models have been based on the assumption that the masking of distortions is governed by the auditory filter for which the ratio between distortion and masker is largest. This assumption, however, is not in line with some new findings within the field of psychoacoustics. A more accurate assumption would be that the human auditory system is able to integrate distortions that are present within a range of auditory filters. In this contribution a new model is presented which is in line with new psychoacoustical studies and which is suitable for application within an audio codec. Although this model can be used to derive a masking curve, the model also gives a measure for the detectability of distortions provided that distortions are not too large.

[1]  E Zwicker,et al.  Inverse frequency dependence of simultaneous tone-on-tone masking patterns at low levels. , 1982, The Journal of the Acoustical Society of America.

[2]  James P. Egan,et al.  Masking‐Level Differences and the Form of the Psychometric Function , 1965 .

[3]  A Kohlrausch,et al.  Differences in auditory performance between monaural and dichotic conditions. I: masking thresholds in frozen noise. , 1992, The Journal of the Acoustical Society of America.

[4]  A. Spanias,et al.  Perceptual coding of digital audio , 2000, Proceedings of the IEEE.

[5]  Adrianus J. M. Houtsma Hawkins and Stevens revisited at low frequencies , 1998 .

[6]  A Kohlrausch,et al.  Spectral integration of broadband signals in diotic and dichotic masking experiments. , 1992, The Journal of the Acoustical Society of America.

[7]  Richard Heusdens,et al.  Rate-distortion optimal sinusoidal modeling of audio and speech using psychoacoustical matching pursuits , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  G. Brink Detection of Tone Pulse of Various Durations in Noise of Various Bandwidths , 1964 .

[9]  S. S. Stevens,et al.  The Masking of Pure Tones and of Speech by White Noise , 1950 .

[10]  Brian R Glasberg,et al.  Derivation of auditory filter shapes from notched-noise data , 1990, Hearing Research.

[11]  Richard Heusdens,et al.  A Gammatone-based Psychoacoustical Modeling Approach for Speech and Audio Coding , 2001 .

[12]  S van de Par,et al.  Dependence of binaural masking level differences on center frequency, masker bandwidth, and interaural parameters. , 1999, The Journal of the Acoustical Society of America.