Learning a Precedence Effect-Like Weighting Function for the Generalized Cross-Correlation Framework

Speech source localization in reverberant environments has proved difficult for automated microphone array systems. Because of its nonstationary nature, certain features observable in the reverberant speech signal, such as sudden increases in audio energy, provide cues to indicate time-frequency regions that are particularly useful for audio localization. We exploit these cues by learning a mapping from reverberated signal spectrograms to localization precision using ridge regression. Using the learned mappings in the generalized cross-correlation framework, we demonstrate improved localization performance. Additionally, the resulting mappings exhibit behavior consistent with the well-known precedence effect from psychoacoustic studies

[1]  L. Jongkees,et al.  On Directional Hearing , 1946, The Journal of Laryngology & Otology.

[2]  J. P. Egan Articulation testing methods , 1948, The Laryngoscope.

[3]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[4]  G. Carter,et al.  The generalized correlation method for estimation of time delay , 1976 .

[5]  H. Gaskell The precedence effect , 1983, Hearing Research.

[6]  W. Lindemann Extension of a binaural cross-correlation model by contralateral inhibition. II. The law of the first wave front. , 1986, The Journal of the Acoustical Society of America.

[7]  W. Lindemann Extension of a binaural cross-correlation model by contralateral inhibition. I. Simulation of lateralization for stationary signals. , 1986, The Journal of the Acoustical Society of America.

[8]  W. Hartmann,et al.  Localization of sound in rooms, III: Onset and duration effects. , 1986, The Journal of the Acoustical Society of America.

[9]  R K Clifton Breakdown of echo suppression in the precedence effect. , 1987, The Journal of the Acoustical Society of America.

[10]  W. Gaik,et al.  Combined evaluation of interaural time and intensity differences: psychoacoustic results and computer modeling. , 1993, The Journal of the Acoustical Society of America.

[11]  Keith D. Martin,et al.  A computational model of spatial hearing , 1995 .

[12]  Robert E Irie,et al.  Robust Sound Localization: An Application of an Auditory Perception System for a Humanoid Robot , 1995 .

[13]  Gene H. Golub,et al.  Matrix computations (3rd ed.) , 1996 .

[14]  T. Yin,et al.  Psychophysical and physiological evidence for a precedence effect in the median sagittal plane. , 1997, Journal of neurophysiology.

[15]  Steven George Goodridge Multimedia sensor fusion for intelligent camera control and human-computer interaction , 1997 .

[16]  Jie Huang,et al.  Sound localization in reverberant environment based on the model of the precedence effect , 1997 .

[17]  H S Colburn,et al.  The precedence effect. , 1999, The Journal of the Acoustical Society of America.

[18]  Rodney A. Kennedy,et al.  Equalization in an acoustic reverberant environment: robustness results , 2000, IEEE Trans. Speech Audio Process..

[19]  Michael S. Brandstein,et al.  Robust Localization in Reverberant Rooms , 2001, Microphone Arrays.

[20]  Michael S. Brandstein,et al.  Microphone Arrays - Signal Processing Techniques and Applications , 2001, Microphone Arrays.

[21]  Yoram Singer,et al.  Discriminative Binaural Sound Localization , 2002, NIPS.

[22]  Mohan M. Trivedi,et al.  Source localization in reverberant environments: modeling and statistical analysis , 2003, IEEE Trans. Speech Audio Process..

[23]  C. Faller,et al.  Source localization in complex listening situations: selection of binaural cues based on interaural coherence. , 2004, The Journal of the Acoustical Society of America.

[24]  K. Wilson Learning the precedence effect: initial real-world tests , 2005, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2005..

[25]  Trevor Darrell,et al.  Improving audio source localization by learning the precedence effect , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..