论文信息 - Sound Event Localization Based on Sound Intensity Vector Refined by Dnn-Based Denoising and Source Separation

Sound Event Localization Based on Sound Intensity Vector Refined by Dnn-Based Denoising and Source Separation

We propose a direction-of-arrival (DOA) estimation method for Sound Event Localization and Detection (SELD). Direct estimation of DOA using a deep neural network (DNN), i.e. completely-datadriven approach, achieves high accuracy. However, there is a gap in the accuracy between DOA estimation for single and overlapping sources because they cannot incorporate physical knowledge. Meanwhile, although the accuracy of physics-based approaches is inferior to DNN-based approaches, it is robust for overlapping-source. In this study, we consider a combination of physics-based and DNN-based approaches; the sound intensity vectors (IVs) for physics-based DOA estimation is refined based on DNN-based denoising and source separation. This method enables the accurate DOA estimation for both single and overlapping sources using a spherical microphone array. Experimental results show that the proposed method achieves state-of-the-art DOA estimation accuracy on an open dataset of the SELD.

[1] R. O. Schmidt,et al. Multiple emitter location and signal Parameter estimation , 1986 .

[2] Tapio Lokki,et al. Teleconference Application and B-Format Microphone Array for Directional Audio Coding , 2007 .

[3] Philip S. Yu,et al. Direction-of-Arrival Estimation Based on Deep Neural Networks With Robustness to Array Imperfections , 2018, IEEE Transactions on Antennas and Propagation.

[4] Mark D. Plumbley,et al. TWO-STAGE SOUND EVENT LOCALIZATION AND DETECTION USING INTENSITY VECTOR AND GENERALIZED CROSS-CORRELATION Technical Report , 2019 .

[5] Srdan Kitic,et al. TRAMP: Tracking by a Real-time AMbisonic-based Particle filter , 2018, ArXiv.

[6] Archontis Politis,et al. Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks , 2018, IEEE Journal of Selected Topics in Signal Processing.

[7] Masahiro Yasuda,et al. First Order Ambisonics Domain Spatial Augmentation for DNN-based Direction of Arrival Estimation , 2019, DCASE.

[8] Jonathan Le Roux,et al. Phase-sensitive and recognition-boosted speech separation using deep recurrent neural networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9] Kyogu Lee,et al. Ensemble of Convolutional Neural Networks for Weakly-supervised Sound Event Detection Using Multiple Scale Input , 2017, DCASE.

[10] Archontis Politis,et al. Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network , 2017, 2018 26th European Signal Processing Conference (EUSIPCO).

[11] Shengkui Zhao,et al. Robust DOA estimation of multiple speech sources , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12] Yusuke Hioka,et al. DNN-Based Source Enhancement to Increase Objective Sound Quality Assessment Score , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13] Archontis Politis,et al. Joint Measurement of Localization and Detection of Sound Events , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[14] Yong Xu,et al. Surrey-cvssp system for DCASE2017 challenge task4 , 2017, ArXiv.

[15] Emmanuel Vincent,et al. CRNN-Based Multiple DoA Estimation Using Acoustic Intensity Features for Ambisonics Recordings , 2019, IEEE Journal of Selected Topics in Signal Processing.

[16] Mateusz Lewandowski,et al. Sound source detection, localization and classification using consecutive ensemble of CRNN models , 2019, DCASE.

[17] Athanasios Mouchtaris,et al. 3D localization of multiple sound sources with intensity vector estimates in single source zones , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[18] D. L. Jones,et al. A two-step system for sound event localization and detection , 2019, ArXiv.

[19] Sina Hafezi,et al. Augmented Intensity Vectors for Direction of Arrival Estimation in the Spherical Harmonic Domain , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[20] Emanuel A. P. Habets,et al. 3D source localization in the spherical harmonic domain using a pseudointensity vector , 2010, 2010 18th European Signal Processing Conference.

[21] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[22] Jiming Chen,et al. Feature Extracted DOA Estimation Algorithm Using Acoustic Array for Drone Surveillance , 2018, 2018 IEEE 87th Vehicular Technology Conference (VTC Spring).

[23] Scott Rickard,et al. Blind separation of speech mixtures via time-frequency masking , 2004, IEEE Transactions on Signal Processing.

[24] Archontis Politis,et al. A multi-room reverberant dataset for sound event localization and detection , 2019, DCASE.

[25] Dong Yu,et al. Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.