Towards a generalized monaural and binaural auditory model for psychoacoustics and speech intelligibility

Auditory perception involves cues in the monaural auditory pathways as well as binaural cues based on differences between the ears. So far auditory models have often focused on either monaural or binaural experiments in isolation. Although binaural models typically build upon stages of (existing) monaural models, only a few attempts have been made to extend a monaural model by a binaural stage using a unified decision stage for monaural and binaural cues. In such approaches, a typical prototype of binaural processing has been the classical equalization-cancelation mechanism, which either involves signal-adaptive delays and provides a single channel output or can be implemented with tapped delays providing a high-dimensional multichannel output. This contribution extends the (monaural) generalized envelope power spectrum model by a non-adaptive binaural stage with only a few, fixed output channels. The binaural stage resembles features of physiologically motivated hemispheric binaural processing, as simplified signal processing stages, yielding a 5-channel monaural and binaural matrix feature"decoder"(BMFD). The back end of the existing monaural model is applied to the 5-channel BMFD output and calculates short-time envelope power and power features. The model is evaluated and discussed for a baseline database of monaural and binaural psychoacoustic experiments from the literature.

[1]  R H Wilson,et al.  Effects of signal duration on the 500-Hz masking-level difference. , 1986, Scandinavian audiology.

[2]  W. Lindemann Extension of a binaural cross-correlation model by contralateral inhibition. II. The law of the first wave front. , 1986, The Journal of the Acoustical Society of America.

[3]  Giso Grimm,et al.  Evaluation of the Influence of Head Movement on Hearing Aid Algorithm Performance Using Acoustic Simulations , 2020, Trends in hearing.

[4]  Elizabeth A. Strickland,et al.  An Introduction to the Psychology of Hearing (6th edition) , 2014 .

[5]  Richard M. Stern,et al.  Lateralization and detection of low‐frequency binaural stimuli: Effects of distribution of internal delay , 1996 .

[6]  N I Durlach,et al.  Intensity perception XI. Experimental results on the relation of intensity resolution to loudness matching. , 1980, The Journal of the Acoustical Society of America.

[7]  S. Ewert Defining the Proper Stimulus and Its Ecology - Mammals , 2020, The Senses: A Comprehensive Reference.

[8]  S van de Par,et al.  Binaural processing model based on contralateral inhibition. II. Dependence on spectral parameters. , 2001, The Journal of the Acoustical Society of America.

[9]  B. Grothe,et al.  Interaural Time Difference Processing in the Mammalian Medial Superior Olive: The Role of Glycinergic Inhibition , 2008, The Journal of Neuroscience.

[10]  W A Yost Prior stimulation and the masking-level difference. , 1985, The Journal of the Acoustical Society of America.

[11]  Leslie R Bernstein,et al.  Enhancing interaural-delay-based extents of laterality at high frequencies by using "transposed stimuli". , 2003, The Journal of the Acoustical Society of America.

[12]  B Kollmeier,et al.  Binaural forward and backward masking: evidence for sluggishness in binaural detection. , 1990, The Journal of the Acoustical Society of America.

[13]  N. Durlach Equalization and Cancellation Theory of Binaural Masking‐Level Differences , 1963 .

[14]  L A JEFFRESS,et al.  A place theory of sound localization. , 1948, Journal of comparative and physiological psychology.

[15]  Ira J. Hirsh,et al.  Binaural Effects in Remote Masking , 1958 .

[16]  Kohlrausch,et al.  The influence of carrier level and frequency on modulation and beat-detection thresholds for sinusoidal carriers , 2000, The Journal of the Acoustical Society of America.

[17]  K. S. Rhebergen,et al.  A Speech Intelligibility Index-based approach to predict the speech reception threshold for sentences in fluctuating noise for normal-hearing listeners. , 2005, The Journal of the Acoustical Society of America.

[18]  Steffen Kortlang,et al.  Suprathreshold auditory processing deficits in noise: Effects of hearing loss and age , 2016, Hearing Research.

[19]  Mathias Dietz,et al.  Neural rate difference model can account for lateralization of high-frequency stimuli. , 2020, The Journal of the Acoustical Society of America.

[20]  Rainer Huber,et al.  An Objective Audio Quality Measure Based on Power and Envelope Power Cues , 2018, Journal of the Audio Engineering Society.

[21]  S van de Par,et al.  Dependence of binaural masking level differences on center frequency, masker bandwidth, and interaural parameters. , 1999, The Journal of the Acoustical Society of America.

[22]  Brian C. J. Moore,et al.  Development and Evaluation of a Model for Predicting the Audibility of Time-Varying Sounds in the Presence of Background Sounds , 2005 .

[23]  Thomas Biberger,et al.  Envelope and intensity based prediction of psychoacoustic masking and speech intelligibility. , 2016, The Journal of the Acoustical Society of America.

[24]  S van de Par,et al.  Binaural processing model based on contralateral inhibition. III. Dependence on temporal parameters. , 2001, The Journal of the Acoustical Society of America.

[25]  Brian C J Moore,et al.  Sensorineural hearing loss enhances auditory sensitivity and temporal integration for amplitude modulation. , 2017, The Journal of the Acoustical Society of America.

[26]  Instrumental Quality Predictions and Analysis of Auditory Cues for Algorithms in Modern Headphone Technology , 2021, Trends in hearing.

[27]  Stephan D Ewert,et al.  Interactions between amplitude modulation and frequency modulation processing: Effects of age and hearing loss. , 2016, The Journal of the Acoustical Society of America.

[28]  Stephan D Ewert,et al.  A two‐path model of auditory modulation detection using temporal fine structure and envelope cues , 2020, The European journal of neuroscience.

[29]  Torsten Dau,et al.  A multi-resolution envelope-power based model for speech intelligibility. , 2013, The Journal of the Acoustical Society of America.

[30]  Brian C. J. Moore,et al.  Development and Validation of a Method for Predicting the Perceived Naturalness of Sounds Subjected to Spectral Distortion , 2004 .

[31]  Werner Hemmert,et al.  Extraction of Inter-Aural Time Differences Using a Spiking Neuron Network Model of the Medial Superior Olive , 2018, Front. Neurosci..

[32]  Jesper Jensen,et al.  Predicting the Intelligibility of Noisy and Nonlinearly Processed Binaural Speech , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[33]  T. Dau,et al.  A computational model of human auditory signal processing and perception. , 2008, The Journal of the Acoustical Society of America.

[34]  C Trahiotis,et al.  The effects of signal duration on NoSo and NoS pi thresholds at 500 Hz and 4 kHz. , 1999, The Journal of the Acoustical Society of America.

[35]  D W Grantham,et al.  Interaural intensity discrimination: insensitivity at 1000 Hz. , 1984, The Journal of the Acoustical Society of America.

[36]  Stephan D Ewert,et al.  The role of short-time intensity and envelope power for speech intelligibility and psychoacoustic masking. , 2017, The Journal of the Acoustical Society of America.

[37]  Birger Kollmeier,et al.  Revision, extension, and evaluation of a binaural speech intelligibility model. , 2010, The Journal of the Acoustical Society of America.

[38]  María G. Cisneros-Solís,et al.  MEDICAL ANNUAL , 1958, Journal of The Royal Naval Medical Service.

[39]  Sharon Gannot,et al.  Binaural Speech Processing with Application to Hearing Devices , 2018, Audio Source Separation and Speech Enhancement.

[40]  R H Wilson,et al.  Influence of signal duration on the masking-level difference. , 1987, Journal of speech and hearing research.

[41]  B. Moore,et al.  Suggested formulae for calculating auditory-filter bandwidths and excitation patterns. , 1983, The Journal of the Acoustical Society of America.

[42]  T. Brand,et al.  Modeling Sluggishness in Binaural Unmasking of Speech for Maskers With Time-Varying Interaural Phase Differences , 2018, Trends in hearing.

[43]  Torsten Dau,et al.  External and internal limitations in amplitude-modulation processing. , 2004, The Journal of the Acoustical Society of America.

[44]  J. Zwislocki,et al.  Just Noticeable Differences in Dichotic Phase , 1956 .

[45]  Thomas Biberger,et al.  Subjective and Objective Assessment of Monaural and Binaural Aspects of Audio Quality , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[46]  D S Brungart,et al.  Informational and energetic masking effects in the perception of two simultaneous talkers. , 2001, The Journal of the Acoustical Society of America.

[47]  Birger Kollmeier,et al.  Binaural masking release in symmetric listening conditions with spectro-temporally modulated maskers. , 2017, The Journal of the Acoustical Society of America.

[48]  A. Kohlrausch,et al.  Binaural processing model based on contralateral inhibition. I. Model structure. , 2001, The Journal of the Acoustical Society of America.

[49]  A Kohlrausch Auditory filter shape derived from binaural masking experiments. , 1988, The Journal of the Acoustical Society of America.

[50]  Stephan D. Ewert,et al.  Assessment and Prediction of Binaural Aspects of Audio Quality , 2017 .

[51]  N. Viemeister Temporal modulation transfer functions based upon modulation thresholds. , 1979, The Journal of the Acoustical Society of America.

[52]  Torsten Dau,et al.  Predicting binaural speech intelligibility using the signal-to-noise ratio in the envelope power spectrum domain. , 2016, The Journal of the Acoustical Society of America.

[53]  Nandini Iyer,et al.  Better-ear glimpsing efficiency with symmetrically-placed interfering talkers. , 2012, The Journal of the Acoustical Society of America.

[54]  A. Oxenham,et al.  Basilar-membrane nonlinearity and the growth of forward masking. , 1996, The Journal of the Acoustical Society of America.

[55]  B. Kollmeier,et al.  Within-channel cues in comodulation masking release (CMR): experiments and model predictions using a modulation-filterbank model. , 1999, The Journal of the Acoustical Society of America.

[56]  Birger Kollmeier,et al.  Development and analysis of an International Speech Test Signal (ISTS) , 2010, International journal of audiology.

[57]  E. Hafter,et al.  Binaural interaction in low-frequency stimuli: the inability to trade time and intensity completely. , 1972, The Journal of the Acoustical Society of America.

[58]  B. Grothe,et al.  The natural history of sound localization in mammals – a story of neuronal inhibition , 2014, Front. Neural Circuits..

[59]  B. Grothe,et al.  Mechanisms of sound localization in mammals. , 2010, Physiological reviews.

[60]  B C Moore,et al.  Masking patterns for sinusoidal and narrow-band noise maskers. , 1998, The Journal of the Acoustical Society of America.

[61]  A. Mills Lateralization of High‐Frequency Tones , 1960 .

[62]  Mathieu Lavandier,et al.  Prediction of binaural speech intelligibility against noise in rooms. , 2010, The Journal of the Acoustical Society of America.

[63]  Volker Hohmann,et al.  Coding of temporally fluctuating interaural timing disparities in a binaural processing model based on phase differences , 2008, Brain Research.

[64]  R. G. Klumpp,et al.  Some Measurements of Interaural Time Difference Thresholds , 1956 .

[65]  B. Grothe,et al.  Psychophysical and Physiological Evidence for Fast Binaural Processing , 2008, The Journal of Neuroscience.

[66]  Nathaniel I Durlach,et al.  Application of a short-time version of the Equalization-Cancellation model to speech intelligibility experiments with speech maskers. , 2014, The Journal of the Acoustical Society of America.

[67]  I. Hirsh The Influence of Interaural Phase on Interaural Summation and Inhibition , 1948 .

[68]  T. Dau,et al.  Characterizing frequency selectivity for envelope fluctuations. , 2000, The Journal of the Acoustical Society of America.

[69]  B. Kollmeier,et al.  Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers. , 1997, The Journal of the Acoustical Society of America.

[70]  B. Kollmeier,et al.  Modeling auditory processing of amplitude modulation. II. Spectral and temporal integration. , 1997, The Journal of the Acoustical Society of America.