Local relative transfer function for sound source localization

The relative transfer function (RTF), i.e. the ratio of acoustic transfer functions between two sensors, can be used for sound' source localization / beamforming based on a microphone array. The RTF is usually defined with respect to a unique reference sensor. Choosing the reference sensor may be a difficult task, especially for dynamic acoustic environment and setup. In this paper we propose to use a locally normalized RTF, in short local-RTF, as an acoustic feature to characterize the source direction. Local-RTF takes a neighbor sensor as the reference channel for a given sensor. The estimated local-RTF vector can thus avoid the bad effects of a noisy unique reference and have smaller estimation error than conventional RTF estimators. We propose two estimators for the local-RTF and concatenate the values across sensors and frequencies to form a high-dimensional vector which is utilized for source localization. Experiments with real-world signals show the interest of this approach.

[1]  Hiroshi Sawada,et al.  MAP-Based Underdetermined Blind Source Separation of Convolutive Mixtures by Hierarchical Clustering and -Norm Minimization , 2007, EURASIP J. Adv. Signal Process..

[2]  Israel Cohen,et al.  Relative transfer function identification using speech signals , 2004, IEEE Transactions on Speech and Audio Processing.

[3]  Radu Horaud,et al.  Mapping sounds onto images using binaural spectrograms , 2014, 2014 22nd European Signal Processing Conference (EUSIPCO).

[4]  Radu Horaud,et al.  Co-Localization of Audio Sources in Images Using Binaural Features and Locally-Linear Regression , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[5]  Hiroshi Sawada,et al.  Underdetermined blind sparse source separation for arbitrarily arranged multiple sensors , 2007, Signal Process..

[6]  Radu Horaud,et al.  Acoustic Space Learning for Sound-Source Separation and Localization on Binaural Manifolds , 2014, Int. J. Neural Syst..

[7]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[8]  Sharon Gannot,et al.  Relative transfer function modeling for supervised source localization , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[9]  Ehud Weinstein,et al.  Signal enhancement using beamforming and nonstationarity with applications to speech , 2001, IEEE Trans. Signal Process..

[10]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[11]  Simon Doclo,et al.  Reference Microphone Selection for MWF-based Noise Reduction Using Distributed Microphone Arrays , 2012, ITG Conference on Speech Communication.

[12]  DeLiang Wang,et al.  Binaural Localization of Multiple Sources in Reverberant and Noisy Environments , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Ieee Staff 2017 25th European Signal Processing Conference (EUSIPCO) , 2017 .

[14]  Sharon Gannot,et al.  Time difference of arrival estimation of speech source in a noisy and reverberant environment , 2005, Signal Process..

[15]  Ehud Weinstein,et al.  Analysis of the power spectral deviation of the general transfer function GSC , 2004, IEEE Transactions on Signal Processing.

[16]  Gerhard Schmidt,et al.  A Minimum variance beamformer for spatially distributed microphones using a soft reference selection , 2014, 2014 4th Joint Workshop on Hands-free Speech Communication and Microphone Arrays (HSCMA).

[17]  Ramani Duraiswami,et al.  Gaussian process models for HRTF based 3D sound localization , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Israel Cohen,et al.  Speech enhancement for non-stationary noise environments , 2001, Signal Process..

[19]  Daniel P. W. Ellis,et al.  Model-Based Expectation-Maximization Source Separation and Localization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.