Binaural Sound Source Localization Based on Convolutional Neural Network

Binaural sound source localization (BSSL) in low signal-to-noise ratio (SNR) and high reverberation environment is still a challenging task. In this paper, a novel BSSL algorithm is proposed by introducing convolutional neural network (CNN). The proposed algorithm first extracts the spatial feature of each sub-band from binaural sound signal, and then combines the features of all sub-bands within one frame to assemble a two-dimensional feature matrix as a grey image. To fully exploit the advantage of the CNN in extracting high-level features from the grey image, the spatial feature matrix of each frame is used as input to train the CNN model. The CNN is then used to predict azimuth of sound source. The experiments show that the proposed algorithm significantly improves the localization performance of BSSL in various acoustic environments, especially to deal with low SNR and high reverberation conditions.

[1]  Steven van de Par,et al.  A Probabilistic Model for Robust Localization Based on a Binaural Auditory Front-End , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Xiaoyan Zhao,et al.  Binaural Sound Source Localization based on Sub-band SNR Estimation , 2015, MUE 2015.

[3]  D. R. Campbell,et al.  A MATLAB Simulation of “ Shoebox ” Room Acoustics for use in Research and Teaching , 2022 .

[4]  Guy J. Brown,et al.  Exploiting deep neural networks and head movements for binaural localisation of multiple speakers in reverberant conditions , 2015, INTERSPEECH.

[5]  Harald Viste,et al.  Binaural Source Localization by Joint Estimation of ILD and ITD , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Huawei Chen,et al.  Acoustic source localization using LS-SVMs without calibration of microphone arrays , 2009, 2009 IEEE International Symposium on Circuits and Systems.

[7]  Juraj Simko,et al.  The CHAINS corpus: CHAracterizing INdividual Speakers , 2006 .

[8]  Guy J. Brown,et al.  Exploiting Deep Neural Networks and Head Movements for Robust Binaural Localization of Multiple Sources in Reverberant Environments , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[9]  Christopher Hummersone,et al.  A Psychoacoustic Engineering Approach to Machine Sound Source Separation in Reverberant Environments , 2011 .

[10]  Rhee Man Kil,et al.  Estimation of Interaural Time Differences Based on Zero-Crossings in Noisy Multisource Environments , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Qi Cui Identifying materials of photographic images and photorealistic computer generated graphics based on deep CNNs , 2018 .

[12]  Hong Liu,et al.  Binaural Sound Localization Based on Reverberation Weighting and Generalized Parametric Mapping , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Yang Yu,et al.  Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks , 2016, EURASIP J. Audio Speech Music. Process..

[14]  Eun Joo Rhee,et al.  Distance Estimation and Localization of Sound Sources in Reverberant Conditions using Deep Neural Networks , 2017 .

[15]  Haizhou Li,et al.  A learning-based approach to direction of arrival estimation in noisy and reverberant environments , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Stephan Gerlach,et al.  On sound source localization of speech signals using deep neural networks , 2015 .

[17]  Guy J. Brown,et al.  Robust Binaural Localization of a Target Sound Source by Combining Spectral Source Models and Deep Neural Networks , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[18]  L. Rayleigh,et al.  XII. On our perception of sound direction , 1907 .

[19]  DeLiang Wang,et al.  Speech segregation based on sound localization , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).