A Dataset of Reverberant Spatial Sound Scenes with Moving Sources for Sound Event Localization and Detection

This report presents the dataset and the evaluation setup of the Sound Event Localization & Detection (SELD) task for the DCASE 2020 Challenge. The SELD task refers to the problem of trying to simultaneously classify a known set of sound event classes, detect their temporal activations, and estimate their spatial directions or locations while they are active. To train and test SELD systems, datasets of diverse sound events occurring under realistic acoustic conditions are needed. Compared to the previous challenge, a significantly more complex dataset was created for DCASE 2020. The two key differences are a more diverse range of acoustical conditions, and dynamic conditions, i.e. moving sources. The spatial sound scenes are created using real room impulse responses captured in a continuous manner with a slowly moving excitation source. Both static and moving sound events are synthesized from them. Ambient noise recorded on location is added to complete the generation of scene recordings. A baseline SELD method accompanies the dataset, based on a convolutional recurrent neural network, to provide benchmark scores for the task. The baseline is an updated version of the one used in the previous challenge, with input features and training modifications to improve its performance.

[1]  Israel Cohen,et al.  On Multiplicative Transfer Function Approximation in the Short-Time Fourier Transform Domain , 2007, IEEE Signal Processing Letters.

[2]  Archontis Politis,et al.  Joint Measurement of Localization and Detection of Sound Events , 2019, 2019 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[3]  Annamaria Mesaros,et al.  Metrics for Polyphonic Sound Event Detection , 2016 .

[4]  Xavier Serra,et al.  A Hybrid Parametric-Deep Learning Approach for Sound Event Localization and Detection , 2019, DCASE.

[5]  Archontis Politis,et al.  Comparing modeled and measurement-based spherical harmonic encoding filters for spherical microphone arrays , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[6]  Soumitro Chakrabarty,et al.  Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals , 2018, IEEE Journal of Selected Topics in Signal Processing.

[7]  Rohith Mars,et al.  Sound Event Localization and Detection using CRNN Architecture with Mixup for Model Generalization , 2019, DCASE.

[8]  Sascha Spors,et al.  Comparison of continuous measurement techniques for spatial room impulse responses , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[9]  Archontis Politis,et al.  A multi-room reverberant dataset for sound event localization and detection , 2019, DCASE.

[10]  Emmanuel Vincent,et al.  Sound Event Detection in the DCASE 2017 Challenge , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Rishabh Ranjan,et al.  Sound Event Detection and Direction of Arrival Estimation using Residual Net and Recurrent Neural Networks , 2019, DCASE.

[12]  Augusto Sarti,et al.  Scream and gunshot detection and localization for audio-surveillance systems , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[13]  Archontis Politis,et al.  Direction of Arrival Estimation for Multiple Sound Sources Using Convolutional Recurrent Neural Network , 2017, 2018 26th European Signal Processing Conference (EUSIPCO).

[14]  Mateusz Lewandowski,et al.  Sound source detection, localization and classification using consecutive ensemble of CRNN models , 2019, DCASE.

[15]  Klaus Obermayer,et al.  The NIGENS General Sound Events Database , 2019, ArXiv.

[16]  Taras Butko,et al.  Two-source acoustic event detection and localization: Online implementation in a Smart-room , 2011, 2011 19th European Signal Processing Conference.

[17]  Andrzej Czyzewski,et al.  Detection, classification and localization of acoustic events in the presence of background noise for acoustic surveillance of hazardous situations , 2015, Multimedia Tools and Applications.

[18]  Mark D. Plumbley,et al.  Polyphonic Sound Event Detection and Localization using a Two-Stage Strategy , 2019, DCASE.

[19]  Toni Hirvonen,et al.  Classification of Spatial Audio Location and Content Using Convolutional Neural Networks , 2015 .

[20]  Sooyoung Park,et al.  TrellisNet-Based Architecture for Sound Event Localization and Detection with Reassembly Learning , 2019, DCASE.

[21]  James R. Glass,et al.  Sound Event Localization and Detection Using CRNN on Pairs of Microphones , 2019, DCASE.

[22]  Archontis Politis,et al.  Sound Event Localization and Detection of Overlapping Sources Using Convolutional Recurrent Neural Networks , 2018, IEEE Journal of Selected Topics in Signal Processing.

[23]  Jonathan Huang,et al.  GCC-PHAT Cross-Correlation Audio Features for Simultaneous Sound Event Localization and Detection (SELD) on Multiple Rooms , 2019, DCASE.

[24]  Gerasimos Potamianos,et al.  Hierarchical Detection of Sound Events and their Localization Using Convolutional Neural Networks with Adaptive Thresholds , 2019, DCASE.

[25]  Gerhard P. Hancke,et al.  Sound based localization and identification in industrial environments , 2017, IECON 2017 - 43rd Annual Conference of the IEEE Industrial Electronics Society.

[26]  Masahiro Yasuda,et al.  First Order Ambisonics Domain Spatial Augmentation for DNN-based Direction of Arrival Estimation , 2019, DCASE.

[27]  Climent Nadeu,et al.  Sound-model-based acoustic source localization using distributed microphone arrays , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[28]  Emmanuel Vincent,et al.  CRNN-Based Multiple DoA Estimation Using Acoustic Intensity Features for Ambisonics Recordings , 2019, IEEE Journal of Selected Topics in Signal Processing.