Sound Event Localization Based on Sound Intensity Vector Refined by Dnn-Based Denoising and Source Separation

We propose a direction-of-arrival (DOA) estimation method for Sound Event Localization and Detection (SELD). Direct estimation of DOA using a deep neural network (DNN), i.e. completely-datadriven approach, achieves high accuracy. However, there is a gap in the accuracy between DOA estimation for single and overlapping sources because they cannot incorporate physical knowledge. Meanwhile, although the accuracy of physics-based approaches is inferior to DNN-based approaches, it is robust for overlapping-source. In this study, we consider a combination of physics-based and DNN-based approaches; the sound intensity vectors (IVs) for physics-based DOA estimation is refined based on DNN-based denoising and source separation. This method enables the accurate DOA estimation for both single and overlapping sources using a spherical microphone array. Experimental results show that the proposed method achieves state-of-the-art DOA estimation accuracy on an open dataset of the SELD.

[1]  R. O. Schmidt,et al.  Multiple emitter location and signal Parameter estimation , 1986 .

[2]  Tapio Lokki,et al.  Teleconference Application and B-Format Microphone Array for Directional Audio Coding , 2007 .

[3]  Philip S. Yu,et al.  Direction-of-Arrival Estimation Based on Deep Neural Networks With Robustness to Array Imperfections , 2018, IEEE Transactions on Antennas and Propagation.

[4]  Mark D. Plumbley,et al.  TWO-STAGE SOUND EVENT LOCALIZATION AND DETECTION USING INTENSITY VECTOR AND GENERALIZED CROSS-CORRELATION Technical Report , 2019 .

[5]  Srdan Kitic,et al.  TRAMP: Tracking by a Real-time AMbisonic-based Particle filter , 2018, ArXiv.

[6]  Archontis Politis,et al.  Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks , 2018, IEEE Journal of Selected Topics in Signal Processing.

[7]  Masahiro Yasuda,et al.  First Order Ambisonics Domain Spatial Augmentation for DNN-based Direction of Arrival Estimation , 2019, DCASE.

[8]  Jonathan Le Roux,et al.  Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Kyogu Lee,et al.  Ensemble of Convolutional Neural Networks for Weakly-supervised Sound Event Detection Using Multiple Scale Input , 2017, DCASE.

[10]  Archontis Politis,et al.  Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network , 2017, 2018 26th European Signal Processing Conference (EUSIPCO).

[11]  Shengkui Zhao,et al.  Robust DOA estimation of multiple speech sources , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Yusuke Hioka,et al.  DNN-Based Source Enhancement to Increase Objective Sound Quality Assessment Score , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Archontis Politis,et al.  Joint Measurement of Localization and Detection of Sound Events , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[14]  Yong Xu,et al.  Surrey-cvssp system for DCASE2017 challenge task4 , 2017, ArXiv.

[15]  Emmanuel Vincent,et al.  CRNN-Based Multiple DoA Estimation Using Acoustic Intensity Features for Ambisonics Recordings , 2019, IEEE Journal of Selected Topics in Signal Processing.

[16]  Mateusz Lewandowski,et al.  Sound source detection, localization and classification using consecutive ensemble of CRNN models , 2019, DCASE.

[17]  Athanasios Mouchtaris,et al.  3D localization of multiple sound sources with intensity vector estimates in single source zones , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[18]  D. L. Jones,et al.  A two-step system for sound event localization and detection , 2019, ArXiv.

[19]  Sina Hafezi,et al.  Augmented Intensity Vectors for Direction of Arrival Estimation in the Spherical Harmonic Domain , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20]  Emanuel A. P. Habets,et al.  3D source localization in the spherical harmonic domain using a pseudointensity vector , 2010, 2010 18th European Signal Processing Conference.

[21]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22]  Jiming Chen,et al.  Feature Extracted DOA Estimation Algorithm Using Acoustic Array for Drone Surveillance , 2018, 2018 IEEE 87th Vehicular Technology Conference (VTC Spring).

[23]  Scott Rickard,et al.  Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[24]  Archontis Politis,et al.  A multi-room reverberant dataset for sound event localization and detection , 2019, DCASE.

[25]  Dong Yu,et al.  Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.