Blind Sound Source Localization based on Deep Learning

This paper presents SSLIDE, Sound Source Localization for Indoors using DEep learning, which applies deep neural networks (DNNs) with encoder-decoder structure to localize sound sources without any prior information about the source candidate locations or source properties. The spatial features of sound signals received by each microphone are extracted and represented as likelihood surfaces for the sound source locations in each point. Our DNN consists of an encoder network followed by two decoders. The encoder obtains a compressed representation of the input likelihoods. One decoder resolves the multipath caused by reverberation, and the other decoder estimates the source location. Experiments show that our method can outperform multiple signal classification (MUSIC), steered response power with phase transform (SRP-PHAT), sparse Bayesian learning (SBL), and a competing convolutional neural network (CNN) approach in the reverberant environment.

[1]  Kazunori Komatani,et al.  Sound source localization based on deep neural networks with directional activate function exploiting phase information , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Philippe Souères,et al.  A survey on sound source localization in robotics: From binaural to array processing methods , 2015, Comput. Speech Lang..

[3]  Kazunori Komatani,et al.  Unsupervised adaptation of deep neural networks for sound source localization using entropy minimization , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Douglas L. Jones,et al.  THE NTU-ADSC SYSTEMS FOR REVERBERATION CHALLENGE 2014 , 2014 .

[5]  Jacob Benesty,et al.  A Generalized Steered Response Power Method for Computationally Viable Source Localization , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[7]  Angeliki Xenaki,et al.  Sound source localization and speech enhancement with sparse Bayesian learning beamforming. , 2018, The Journal of the Acoustical Society of America.

[8]  Aren Jansen,et al.  Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Jont B. Allen,et al.  Image method for efficiently simulating small‐room acoustics , 1976 .

[10]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[11]  Ivan Dokmanic,et al.  Pyroomacoustics: A Python Package for Audio Room Simulation and Array Processing Algorithms , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Sharon Gannot,et al.  Semi-Supervised Sound Source Localization Based on Manifold Regularization , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Javier Macías Guarasa,et al.  Proposal and validation of an analytical generative model of SRP-PHAT power maps in reverberant scenarios , 2016, Signal Process..

[14]  Sharon Gannot,et al.  Machine learning in acoustics: Theory and applications. , 2019, The Journal of the Acoustical Society of America.

[15]  Sanjeev Khudanpur,et al.  Librispeech: An ASR corpus based on public domain audio books , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  GannotSharon,et al.  Semi-supervised sound source localization based on manifold regularization , 2016 .

[17]  Shengkui Zhao,et al.  A real-time 3D sound localization system with miniature microphone array for virtual reality , 2012, 2012 7th IEEE Conference on Industrial Electronics and Applications (ICIEA).

[18]  DeLiang Wang,et al.  Robust Speaker Localization Guided by Deep Learning-Based Time-Frequency Masking , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19]  Haizhou Li,et al.  A learning-based approach to direction of arrival estimation in noisy and reverberant environments , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  Archontis Politis,et al.  Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks , 2018, IEEE Journal of Selected Topics in Signal Processing.

[21]  Peter Vary,et al.  Multichannel audio database in various acoustic environments , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[22]  Sharon Gannot,et al.  Semi-Supervised Source Localization with Deep Generative Modeling , 2020, 2020 IEEE 30th International Workshop on Machine Learning for Signal Processing (MLSP).

[23]  Hiroaki Kitano,et al.  Active Audition for Humanoid , 2000, AAAI/IAAI.

[24]  Dinesh Bharadia,et al.  BLoc: CSI-based accurate localization for BLE tags , 2018, CoNEXT.

[25]  Peter Gerstoft,et al.  Robust Ocean Acoustic Localization With Sparse Bayesian Learning , 2019, IEEE Journal of Selected Topics in Signal Processing.

[26]  Soumitro Chakrabarty,et al.  Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals , 2018, IEEE Journal of Selected Topics in Signal Processing.

[27]  Taewoo Lee,et al.  Fast Sound Source Localization Using Two-Level Search Space Clustering , 2016, IEEE Transactions on Cybernetics.

[28]  Dinesh Bharadia,et al.  Deep learning based wireless localization for indoor navigation , 2020, MobiCom.