SOUND EVENT DETECTION IN DOMESTIC ENVIRONMENTS USING ENSEMBLE OF CONVOLUTIONAL RECURRENT NEURAL NETWORKS Technical Report

In this paper, we present a method to detect sound events in domestic environments using small weakly labeled data, large unlabeled data, and strongly labeled synthetic data as proposed in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2019 challenge task 4. To solve the problem, we use convolutional recurrent neural network (CRNN), as it stacks convolutional neural networks (CNN) and bi-directional gated recurrent unit (Bi-GRU). Moreover, we propose various methods such as data augmentation, event activity detection, multi-median filtering, mean-teacher student model, and the ensemble of neural networks to improve performance. By combining the proposed method, sound event detection performance can be enhanced, compared with the baseline algorithm. As a result, performance evaluation shows that the proposed method provides detection results of 40.89% for event-based metrics and 66.17% for segmentbased metrics.

[1]  Mounya Elhilali,et al.  Joint Acoustic and Class Inference for Weakly Supervised Sound Event Detection , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Lu Jiakai,et al.  MEAN TEACHER CONVOLUTION SYSTEM FOR DCASE 2018 TASK 4 , 2018 .

[3]  Aren Jansen,et al.  Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Nicolas Turpault,et al.  Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments , 2018, DCASE.

[5]  Mathieu Lagrange,et al.  Detection and Classification of Acoustic Scenes and Events: Outcome of the DCASE 2016 Challenge , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Quoc V. Le,et al.  SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[7]  Marian Verhelst,et al.  The SINS Database for Detection of Daily Activities in a Home Environment Using an Acoustic Sensor Network , 2017, DCASE.

[8]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[9]  Sangwon Suh,et al.  Weakly labeled semi-supervised sound event detection using CRNN with inception module , 2018, DCASE.

[10]  Romain Serizel,et al.  Sound Event Detection from Partially Annotated Data: Trends and Challenges , 2019 .

[11]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[12]  Yong Xu,et al.  DCASE 2018 Challenge baseline with convolutional neural networks , 2018, ArXiv.

[13]  VirtanenTuomas,et al.  Detection and Classification of Acoustic Scenes and Events , 2018 .

[14]  Xavier Serra,et al.  Freesound Datasets: A Platform for the Creation of Open Audio Datasets , 2017, ISMIR.