The Sensitivity Matrix: Using Advanced Auditory Models in Speech and Audio Processing

Perceptually optimal processing of speech and audio signals demands a rigorous approach using a distortion measure that resembles human perception. This requires distortion measures based on sophisticated, complex auditory models. Under the assumption of small distortions these models can be simplified by means of a sensitivity matrix. In this paper, we show the power of this approach. We present a method to derive the sensitivity matrix for distortion measures based on spectro-temporal auditory models. This method is applied to an example auditory model and the region of validity of the approximation and the application of linear algebra to analyze the characteristics of the given model are discussed. Furthermore, we show how to build a coder minimizing a sensitivity matrix distortion measure given the typically long support of a perceptual distortion measure

[1]  James David Johnston,et al.  Enhancing the Performance of Perceptual Audio Coders by Using Temporal Noise Shaping (TNS) , 1996 .

[2]  T Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. I. Model structure. , 1996, The Journal of the Acoustical Society of America.

[3]  B. Atal,et al.  Predictive coding of speech signals and subjective error criteria , 1979 .

[4]  Tamás Linder,et al.  High-Resolution Source Coding for Non-Difference Distortion Measures: Multidimensional Companding , 1999, IEEE Trans. Inf. Theory.

[5]  Saburo Tazaki,et al.  Asymptotic performance of block quantizers with difference distortion measures , 1980, IEEE Trans. Inf. Theory.

[6]  Bishnu S. Atal,et al.  Optimizing predictive coders for minimum audible noise , 1979, ICASSP.

[7]  T. Dau,et al.  A quantitative model of the "effective" signal processing in the auditory system. II. Simulations and measurements. , 1996, The Journal of the Acoustical Society of America.

[8]  Bernd Edler Codierung von Audiosignalen mit überlappender Transformation und adaptiven Fensterfunktionen , 1989 .

[9]  Manfred R. Schroeder,et al.  Code-excited linear prediction(CELP): High-quality speech at very low bit rates , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Torsten Daub Modeling auditory processing of amplitude modulation I. Detection and masking with narrow-band carriers , 1997 .

[11]  W. Bastiaan Kleijn,et al.  The sensitivity matrix for a spectro-temporal auditory model , 2004, 2004 12th European Signal Processing Conference.

[12]  B. Kollmeier,et al.  Modeling auditory processing of amplitude modulation. II. Spectral and temporal integration. , 1997, The Journal of the Acoustical Society of America.

[13]  R. Patterson,et al.  A pulse ribbon model of monaural phase perception. , 1987, The Journal of the Acoustical Society of America.

[14]  E. Owens,et al.  An Introduction to the Psychology of Hearing , 1997 .

[15]  Takehiro Moriya,et al.  4.8 kbit/s delayed decision CELP coder using tree coding , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[16]  John G. Beerends,et al.  A Perceptual Audio Quality Measure Based on a Psychoacoustic Sound Representation , 1992 .

[17]  Bhaskar D. Rao,et al.  Theoretical analysis of the high-rate vector quantization of LPC parameters , 1995, IEEE Trans. Speech Audio Process..

[18]  T Dau,et al.  On the role of envelope fluctuation processing in spectral masking. , 2000, The Journal of the Acoustical Society of America.

[19]  Nuggehally Sampath Jayant,et al.  Tree-Encoding of Speech Using the (M, L)-Algorithm and Adaptive Quantization , 1978, IEEE Trans. Commun..

[20]  Nuggehally Sampath Jayant,et al.  Improving the performance of the 16 kb/s LD-CELP speech coder , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Allen Gersho,et al.  Auditory distortion measure for speech coding , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[22]  B. Moore An introduction to the psychology of hearing, 3rd ed. , 1989 .

[23]  B. Atal,et al.  Optimizing digital speech coders by exploiting masking properties of the human ear , 1978 .

[24]  Birger Kollmeier,et al.  Objective Modeling of Speech Quality with a Psychoacoustically Validated Auditory Model , 2000 .

[25]  Peter Kabal,et al.  A low delay 16 kb/s speech coder , 1991, IEEE Trans. Signal Process..

[26]  James D. Johnston,et al.  Transform coding of audio signals using perceptual noise criteria , 1988, IEEE J. Sel. Areas Commun..

[27]  R. Patterson,et al.  Time-domain modeling of peripheral auditory processing: a modular architecture and a software platform. , 1995, The Journal of the Acoustical Society of America.

[28]  Jan Skoglund,et al.  On time-frequency masking in voiced speech , 2000, IEEE Trans. Speech Audio Process..

[29]  Paul Mermelstein,et al.  Delayed Decision Coding of Pitch and Innovation Signals in Code-Excited Linear Prediction Coding of Speech , 1993 .

[30]  E Zwicker,et al.  Inverse frequency dependence of simultaneous tone-on-tone masking patterns at low levels. , 1982, The Journal of the Acoustical Society of America.

[31]  Tamás Linder,et al.  High-Resolution Source Coding for Non-Difference Distortion Measures: The Rate-Distortion Function , 1997, IEEE Trans. Inf. Theory.

[32]  Gernot Kubin,et al.  On speech coding in a perceptual domain , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[33]  Richard Heusdens,et al.  A new psychoacoustical masking model for audio coding applications , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[34]  Roy D. Patterson,et al.  A FUNCTIONAL MODEL OF NEURAL ACTIVITY PATTERNS AND AUDITORY IMAGES , 2004 .

[35]  Robert M. Gray,et al.  Asymptotic Performance of Vector Quantizers with a Perceptual Distortion Measure , 1997, IEEE Trans. Inf. Theory.