Spatial feature learning for robust binaural sound source localization using a composite feature vector

The performance of binaural speech source localization systems can be significantly impacted by an imperfect selection of spatial localization cues, due to the limited bandwidth of speech, and the effects of noise. In order to mitigate these impacts, this paper presents a novel method that combines a deterministic localization approach with a spatial feature learning process. Here, we (i) obtain a composite feature vector derived from analysing the mutual information between different spatial cues and (ii) estimate the optimum feature combination that minimizes the angular localization error in three dimensional space. The performance of the proposed mutual information based feature learning approach is evaluated for a range of speech inputs and noise conditions. We also demonstrate that the proposed approach improves the localization accuracy and its robustness, with respect to traditional localization algorithms, especially in the relatively low signal-to-noise ratio localization scenarios.

[1]  F. Keyrouz,et al.  An Enhanced Binaural 3D Sound Localization Algorithm , 2006, 2006 IEEE International Symposium on Signal Processing and Information Technology.

[2]  Kazuhiro Iida,et al.  Median plane localization using a parametric model of the head-related transfer function based on spectral cues , 2007 .

[3]  José Santos-Victor,et al.  Sound Localization for Humanoid Robots - Building Audio-Motor Maps based on the HRTF , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[4]  DeLiang Wang,et al.  Binaural Localization of Multiple Sources in Reverberant and Noisy Environments , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Thushara D. Abhayapala,et al.  Binaural localization of speech sources in 3-D using a composite feature vector of the HRTF , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Joseph P. Olive,et al.  Text-to-speech synthesis , 1995, AT&T Technical Journal.

[7]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  C. Faller,et al.  Source localization in complex listening situations: selection of binaural cues based on interaural coherence. , 2004, The Journal of the Acoustical Society of America.

[9]  Brian J. d'Auriol,et al.  A novel feature selection method based on normalized mutual information , 2011, Applied Intelligence.

[10]  Richard M. Stern,et al.  Interaural Correlation as the Basis of a Working Model of Binaural Processing: An Introduction , 2005 .

[11]  Keith D. Martin Estimating azimuth and elevation from interaural differences , 1995, Proceedings of 1995 Workshop on Applications of Signal Processing to Audio and Accoustics.

[12]  Gregory H. Wakefield,et al.  Introduction to Head-Related Transfer Functions (HRTFs): Representations of HRTFs in Time, Frequency, and Space , 2001 .

[13]  Martin Bouchard,et al.  Improved Noise Power Spectrum Density Estimation for Binaural Hearing Aids Operating in a Diffuse Noise Field Environment , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Harald Viste,et al.  Binaural Source Localization by Joint Estimation of ILD and ITD , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Ning Ma,et al.  The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..

[16]  Pavel Zahorik,et al.  Perceptual recalibration in human sound localization: learning to remediate front-back reversals. , 2006, The Journal of the Acoustical Society of America.

[17]  Paul M. Hofman,et al.  Relearning sound localization with new ears , 1998, Nature Neuroscience.

[18]  C. Avendano,et al.  The CIPIC HRTF database , 2001, Proceedings of the 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575).

[19]  Virginia Best,et al.  The role of high frequencies in speech localization. , 2005, The Journal of the Acoustical Society of America.

[20]  H. Colburn,et al.  Models of Sound Localization , 2005 .

[21]  Ramani Duraiswami,et al.  Extracting the frequencies of the pinna spectral notches in measured head related impulse responses. , 2004, The Journal of the Acoustical Society of America.

[22]  Thushara D. Abhayapala,et al.  Binaural localization of speech sources in the median plane using cepstral hrtf extraction , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[23]  V R Algazi,et al.  Elevation localization and head-related transfer function analysis at low frequencies. , 2001, The Journal of the Acoustical Society of America.

[24]  Chong-Ho Choi,et al.  Input Feature Selection by Mutual Information Based on Parzen Window , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[25]  D. M. Green,et al.  Sound localization by human listeners. , 1991, Annual review of psychology.

[26]  Hong Liu,et al.  A binaural sound source localization model based on time-delay compensation and interaural coherence , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[27]  Volker Hohmann,et al.  Auditory model based direction estimation of concurrent speakers from binaural signals , 2011, Speech Commun..