论文信息 - An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection

An Effective Perturbation Based Semi-Supervised Learning Method for Sound Event Detection

Mean teacher based methods are increasingly achieving stateof-the-art performance for large-scale weakly labeled and unlabeled sound event detection (SED) tasks in recent DCASE challenges. By penalizing inconsistent predictions under different perturbations, mean teacher methods can exploit largescale unlabeled data in a self-ensembling manner. In this paper, an effective perturbation based semi-supervised learning (SSL) method is proposed based on the mean teacher method. Specifically, a new independent component (IC) module is proposed to introduce perturbations for different convolutional layers, designed as a combination of batch normalization and dropblock operations. The proposed IC module can reduce correlation between neurons to improve performance. A global statistics pooling based attention module is further proposed to explicitly model inter-dependencies between the time-frequency domain and channels, using statistics information (e.g. mean, standard deviation, max) along different dimensions. This can provide an effective attention mechanism to adaptively re-calibrate the output feature map. Experimental results on Task 4 of the DCASE2018 challenge demonstrate the superiority of the proposed method, achieving about 39.8% F1-score, outperforming the previous winning system’s 32.4% by a significant margin.

[1] Xiangdong Wang,et al. What you need is a more professional teacher , 2019, ArXiv.

[2] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[3] Bo Zhang,et al. Smooth Neighbors on Teacher Graphs for Semi-Supervised Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4] Timo Aila,et al. Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[5] Annamaria Mesaros,et al. Metrics for Polyphonic Sound Event Detection , 2016 .

[6] Florian Metze,et al. Exploring audio semantic concepts for event-based video retrieval , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7] Quoc V. Le,et al. SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition , 2019, INTERSPEECH.

[8] Harri Valpola,et al. Weight-averaged consistency targets improve semi-supervised deep learning results , 2017, ArXiv.

[9] Stephan Gerlach,et al. Acoustic Monitoring and Localization for Social Care , 2012, J. Comput. Sci. Eng..

[10] Seungjin Choi,et al. Independent Component Analysis , 2009, Handbook of Natural Computing.

[11] Ian McLoughlin,et al. A Region Based Attention Method for Weakly Supervised Sound Event Detection and Classification , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12] Quoc V. Le,et al. DropBlock: A regularization method for convolutional networks , 2018, NeurIPS.

[13] Xiang Li,et al. Understanding the Disharmony Between Dropout and Batch Normalization by Variance Shift , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Pengfei Chen,et al. Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks , 2019, ArXiv.

[15] Ian McLoughlin,et al. Task-Aware Mean Teacher Method for Large Scale Weakly Labeled Semi-Supervised Sound Event Detection , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16] Lu Jiakai,et al. MEAN TEACHER CONVOLUTION SYSTEM FOR DCASE 2018 TASK 4 , 2018 .

[17] Shin Ishii,et al. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19] Nitish Srivastava,et al. Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[20] Nicolas Turpault,et al. Large-Scale Weakly Labeled Semi-Supervised Sound Event Detection in Domestic Environments , 2018, DCASE.

[21] A. Southern,et al. Sounding out smart cities: Auralization and soundscape monitoring for environmental sound design , 2017 .

[22] Kazunori Komatani,et al. Sound source localization based on deep neural networks with directional activate function exploiting phase information , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[23] Martial Hebert,et al. Semi-Supervised Self-Training of Object Detection Models , 2005, 2005 Seventh IEEE Workshops on Applications of Computer Vision (WACV/MOTION'05) - Volume 1.

[24] Yong Xu,et al. Large-Scale Weakly Supervised Audio Classification Using Gated Convolutional Neural Network , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26] Yan Song,et al. Robust Sound Event Classification Using Deep Neural Networks , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.