Couple Learning: Mean Teacher method with pseudo-labels improves semi-supervised deep learning results

The recently proposed Mean Teacher has achieved state-ofthe-art results in several semi-supervised learning benchmarks. The Mean Teacher method can exploit large-scale unlabeled data in a self-ensembling manner. In this paper, an effective Couple Learning method 1 based on a well-trained model and a Mean Teacher model is proposed. The proposed pseudo-labels generated model (PLG) can increase stronglylabeled data and weakly-labeled data to improve performance of the Mean Teacher method. The Mean Teacher method can suppress noise in pseudo-labels data. The Couple Learning method can extract more information in the compound training data. These experimental results on Task 4 of the DCASE2020 challenge demonstrate the superiority of the proposed method, achieving about 39.18% F1-score on public eval set, outperforming the baseline system’s 37.12% by a significant margin.

[1]  Xiangdong Wang,et al.  Guided multi-branch learning systems for DCASE 2020 Task 4 , 2020, ArXiv.

[2]  MEAN TEACHER WITH SOUND SOURCE SEPARATION AND DATA AUGMENTATION FOR DCASE 2020 TASK 4 Technical Report , 2020 .

[3]  Justin Salamon,et al.  Sound Event Detection and Separation: A Benchmark on Desed Synthetic Soundscapes , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Hugo Van hamme,et al.  Audiovisual transfer learning for audio tagging and sound event detection , 2021, Interspeech 2021.

[5]  Hye-jin Shim,et al.  DCASENET: An Integrated Pretrained Deep Neural Network for Detecting and Classifying Acoustic Scenes and Events , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[6]  Philip Bachman,et al.  Learning with Pseudo-Ensembles , 2014, NIPS.

[7]  Ankit Shah,et al.  Sound Event Detection in Domestic Environments with Weakly Labeled Data and Soundscape Synthesis , 2019, DCASE.

[8]  Timo Aila,et al.  Temporal Ensembling for Semi-Supervised Learning , 2016, ICLR.

[9]  Yi-Hsuan Yang,et al.  Learning to Recognize Transient Sound Events using Attentional Supervision , 2018, IJCAI.

[10]  Justin Salamon,et al.  Adaptive Pooling Operators for Weakly Labeled Sound Event Detection , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[11]  Tuomas Virtanen,et al.  Sound event detection using weakly labeled dataset with stacked convolutional and recurrent neural network , 2017, ArXiv.

[12]  Nicolas Turpault,et al.  Training Sound Event Detection on a Heterogeneous Dataset , 2020, DCASE.

[13]  Bin Yang,et al.  Multi-level attention model for weakly supervised audio classification , 2018, DCASE.

[14]  Emmanuel Vincent,et al.  Sound Event Detection in the DCASE 2017 Challenge , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[15]  Xiangdong Wang,et al.  Guided Learning Convolution System for DCASE 2019 Task 4 , 2019, DCASE.

[16]  Janek Ebbers,et al.  Forward-Backward Convolutional Recurrent Neural Networks and Tag-Conditioned Convolutional Neural Networks for Weakly Labeled Semi-supervised Sound Event Detection , 2021, DCASE.