Soft-Median Choice: An Automatic Feature Smoothing Method for Sound Event Detection.

In existing Sound Event Detection (SED) algorithms, the roughness of extracted feature causes decline of precision and recall. In order to solve this problem, a novel automatic feature smoothing algorithm based on Soft-Median Choice is proposed. Firstly, in the feature extractor of Convolutional Recurrent Neural Network (CRNN), 1-dimension (1-D) convolutional layers are added to extract more temporal information. Secondly, a novel module of the Median Choice is inserted into CRNN. It is consisted of median filters and a Linear Choice layer to automatically get the knowledge of the features with different smoothing levels. Thirdly, a Soft-Median function is designed to replace the median function. It uses all the data instead of one, so as to dredge the path of gradient flowing and make the network converge better. Finally, in the classifier, the Linear Softmax is utilized to avoid the unnecessary false positives caused by attention module. Through evaluations, we demonstrate that the proposed method obtains significantly better scores than the referential algorithms.

[1]  Saurabh Pal,et al.  Acoustic feature based unsupervised approach of heart sound event detection , 2020, Comput. Biol. Medicine.

[2]  Tomoki Toda,et al.  Weakly-Supervised Sound Event Detection with Self-Attention , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Giuseppe Ciaburro,et al.  Sound Event Detection in Underground Parking Garage Using Convolutional Neural Network , 2020, Big Data Cogn. Comput..

[4]  Sacha Krstulovic,et al.  A Framework for the Robust Evaluation of Sound Event Detection , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Yong Xu,et al.  Capsule Routing for Sound Event Detection , 2018, 2018 26th European Signal Processing Conference (EUSIPCO).

[6]  Yan Song,et al.  A Capsule based Approach for Polyphonic Sound Event Detection , 2018, 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC).

[7]  Nicolas Turpault,et al.  Training Sound Event Detection on a Heterogeneous Dataset , 2020, DCASE.

[8]  Stefano Squartini,et al.  Polyphonic Sound Event Detection by Using Capsule Neural Networks , 2018, IEEE Journal of Selected Topics in Signal Processing.

[9]  Harri Valpola,et al.  Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[10]  Tomoki Toda,et al.  CONVOLUTION-AUGMENTED TRANSFORMER FOR SEMI-SUPERVISED SOUND EVENT DETECTION Technical Report , 2020 .

[11]  Huiyong Li,et al.  SOUND EVENT DETECTION IN DOMESTIC ENVIRONMENTS USING DENSE RECURRENT NEURAL NETWORK Technical Report , 2020 .

[12]  Mark D. Plumbley,et al.  Sound Event Detection of Weakly Labelled Data With CNN-Transformer and Automatic Threshold Optimization , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[13]  Yann Dauphin,et al.  Language Modeling with Gated Convolutional Networks , 2016, ICML.

[14]  Annamaria Mesaros,et al.  Metrics for Polyphonic Sound Event Detection , 2016 .

[15]  Florian Metze,et al.  A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Yagya Raj Pandeya,et al.  Visual Object Detector for Cow Sound Event Detection , 2020, IEEE Access.

[17]  Lionel Delphin-Poulat,et al.  MEAN TEACHER WITH DATA AUGMENTATION FOR DCASE 2019 TASK 4 Technical Report , 2019 .

[18]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.