Binaural localization of speech sources in the median plane using cepstral hrtf extraction

In binaural systems, source localization in the median plane is challenging due to the difficulty of exploring the spectral cues of the head-related transfer function (HRTF) independently of the source spectra. This paper presents a method of extracting the HRTF spectral cues using cepstral analysis for speech source localization in the median plane. Binaural signals are preprocessed in the cepstral domain so that the fine spectral structure of speech and the HRTF spectral envelope can be easily separated. We introduce (i) a truncated cepstral transformation to extract the relevant localization cues, and (ii) a mechanism to normalize the effects of the time varying speech spectra. The proposed method is evaluated and compared with a convolution based localization method using a speech corpus of multiple speakers. The results suggest that the proposed method fully exploits the available spectral cues for robust speaker independent binaural source localization in the median plane.

[1]  José Santos-Victor,et al.  Sound Localization for Humanoid Robots - Building Audio-Motor Maps based on the HRTF , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[2]  DeLiang Wang,et al.  Binaural Localization of Multiple Sources in Reverberant and Noisy Environments , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  W M Hartmann,et al.  Identification and localization of sound sources in the median sagittal plane. , 1999, The Journal of the Acoustical Society of America.

[4]  Harald Viste,et al.  Binaural Source Localization by Joint Estimation of ILD and ITD , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Bill Gardner,et al.  HRTF Measurements of a KEMAR Dummy-Head Microphone , 1994 .

[6]  Benoît Champagne,et al.  A new cepstral prefiltering technique for estimating time delay under reverberant conditions , 1997, Signal Process..

[7]  D. M. Green,et al.  Sound localization by human listeners. , 1991, Annual review of psychology.

[8]  R M Cox,et al.  Composite speech spectrum for hearing and gain prescriptions. , 1988, Journal of speech and hearing research.

[9]  H. Takemoto,et al.  Mechanism for generating peaks and notches of head-related transfer functions in the median plane. , 2012, The Journal of the Acoustical Society of America.

[10]  Kazuhiro Iida,et al.  Median plane localization using a parametric model of the head-related transfer function based on spectral cues , 2007 .

[11]  Klaus Diepold,et al.  A New Method for Binaural 3-D Localization Based on Hrtfs , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  Ning Ma,et al.  The PASCAL CHiME speech separation and recognition challenge , 2013, Comput. Speech Lang..

[13]  Joseph P. Olive,et al.  Text-to-speech synthesis , 1995, AT&T Technical Journal.

[14]  Thushara D. Abhayapala,et al.  Broadband DOA Estimation Using Sensor Arrays on Complex-Shaped Rigid Bodies , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  G.R. Doddington,et al.  Speaker recognition—Identifying people by their voices , 1985, Proceedings of the IEEE.

[16]  F. Keyrouz,et al.  Real time humanoid sound source localization and tracking in a highly reverberant environment , 2008, 2008 9th International Conference on Signal Processing.

[17]  Michele Scarpiniti,et al.  Cepstrum Prefiltering for Binaural Source Localization in Reverberant Environments , 2012, IEEE Signal Processing Letters.

[18]  F. Keyrouz,et al.  An Enhanced Binaural 3D Sound Localization Algorithm , 2006, 2006 IEEE International Symposium on Signal Processing and Information Technology.