Binaural and Multiple-Microphone Signal Processing Motivated by Auditory Perception

It is well known that binaural processing is very useful for separating incoming sound sources as well as for improving the intelligibility of speech in reverberant environments. This paper describes and compares a number of ways in which the classic model of interaural cross-correlation proposed by Jeffress, quantified by Colburn, and further elaborated by Blauert, Lindemann, and others, can be applied to improving the accuracy of automatic speech recognition systems operating in cluttered, noisy, and reverberant environments. Typical implementations begin with an abstraction of cross-correlation of the incoming signals after nonlinear monaural bandpass processing, but there are many alternative implementation choices that can be considered. Typical implementations differ in the ways in which an enhanced version of the desired signal is developed using binaural principles, in the extent to which specific processing mechanisms are used to impose suppression motivated by the precedence effect, and in the precise mechanism used to extract interaural time differences.

[1]  Timothy R. Anderson,et al.  A binaural selectivity model for speech recognition , 1995, EUROSPEECH.

[2]  Richard M. Stern,et al.  Spatial Separation of Speech Signals Using Continuously-Variable Masks Estimated From Comparisons of Zero Crossings , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[3]  D. R. Campbell,et al.  A MATLAB Simulation of “ Shoebox ” Room Acoustics for use in Research and Teaching , 2022 .

[4]  Richard M. Stern,et al.  The Role of Consistency of Interaural Timing Over Frequency in Binaural Lateralization , 1992 .

[5]  Guy J. Brown,et al.  A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation , 2004, Speech Commun..

[6]  H S Colburn,et al.  Theory of binaural interaction based in auditory-nerve data. IV. A model for subjective lateral position. , 1978, The Journal of the Acoustical Society of America.

[7]  M. Bodden Modeling human sound-source localization and the cocktail-party-effect , 1993 .

[8]  S. Seneff A joint synchrony/mean-rate model of auditory speech processing , 1990 .

[9]  D. McAlpine,et al.  Interaural delay sensitivity and the classification of low best-frequency binaural responses in the inferior colliculus of the guinea pig , 1996, Hearing Research.

[10]  Keith D. Martin Echo suppression in a computational model of the precedence effect , 1997, Proceedings of 1997 Workshop on Applications of Signal Processing to Audio and Acoustics.

[11]  R. Patterson,et al.  Time-domain modeling of peripheral auditory processing: a modular architecture and a software platform. , 1995, The Journal of the Acoustical Society of America.

[12]  Evandro B. Gouvêa,et al.  "polyaural" Array Processing for Automatic Speech Recognition in Degraded Environments , 2007, INTERSPEECH.

[13]  J. E. Rose,et al.  Some neural mechanisms in the inferior colliculus of the cat which may be relevant to localization of a sound source. , 1966, Journal of neurophysiology.

[14]  DeLiang Wang,et al.  Binaural segregation in multisource reverberant environments. , 2006, The Journal of the Acoustical Society of America.

[15]  T. Anderson,et al.  Binaural and spatial hearing in real and virtual environments , 1997 .

[16]  DeLiang Wang,et al.  Binaural tracking of multiple moving sources , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[17]  Richard F. Lyon A computational model of binaural localization and separation , 1983, ICASSP.

[18]  W. Gaik,et al.  Combined evaluation of interaural time and intensity differences: psychoacoustic results and computer modeling. , 1993, The Journal of the Acoustical Society of America.

[19]  H S Colburn,et al.  Speech intelligibility and localization in a multi-source environment. , 1999, The Journal of the Acoustical Society of America.

[20]  W. Lindemann Extension of a binaural cross-correlation model by contralateral inhibition. I. Simulation of lateralization for stationary signals. , 1986, The Journal of the Acoustical Society of America.

[21]  L. Carney,et al.  A phenomenological model for the responses of auditory-nerve fibers: I. Nonlinear tuning with compression and suppression. , 2001, The Journal of the Acoustical Society of America.

[22]  Don H. Johnson,et al.  Array Signal Processing: Concepts and Techniques , 1993 .

[23]  Guy J. Brown,et al.  Speech segregation based on sound localization , 2003 .

[24]  W. Lindemann Extension of a binaural cross-correlation model by contralateral inhibition. II. The law of the first wave front. , 1986, The Journal of the Acoustical Society of America.

[25]  J. Flanagan,et al.  Computer‐steered microphone arrays for sound transduction in large rooms , 1985 .

[26]  DeLiang Wang,et al.  Binaural Sound Localization , 2006 .

[27]  L. Rayleigh,et al.  XII. On our perception of sound direction , 1907 .

[28]  L A JEFFRESS,et al.  A place theory of sound localization. , 1948, Journal of comparative and physiological psychology.

[29]  Jens Blauert,et al.  Modelling of Interaural Time and Intensity Difference Discrimination , 1980 .

[30]  Donna L. Hudson,et al.  Neural Signal Processing , 2005 .

[31]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[32]  B. Franklin Acoustical Factors Affecting Hearing Aid Performance. , 1981 .

[33]  Richard M. Stern,et al.  Multi-microphone correlation-based processing for robust speech recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[34]  H S Colburn,et al.  Theory of binaural interaction based on auditory-nerve data. I. General strategy and preliminary results on interaural discrimination. , 1973, The Journal of the Acoustical Society of America.

[35]  DeLiang Wang,et al.  Binary and ratio time-frequency masks for robust speech recognition , 2006, Speech Commun..

[36]  Alan R. Palmer,et al.  Chapter 3 – Neural Signal Processing , 1995 .