A Scene-Dependent Sound Event Detection Approach Using Multi-Task Learning

Sound event detection (SED) and acoustic scene classification (ASC) are two key tasks related to each other in the field of computational auditory scene analysis. For example, during sound event detection, scene information can be used to exclude sound events that are unlikely to occur in this scene. In other words, scene information can improve the accuracy of sound event detection. However, existing works rarely detect sound events by considering acoustic scene information. Based on the internal relationship between sound events and scene information, this paper proposes a scene-dependent sound event detection (SDSED) approach, which combines scene information and sound event information using multi-task learning. In the proposed approach, we share common feature representation for the two tasks simultaneously. Meanwhile, a temporal attention mechanism is used to extract informative features from sound recordings. We test the proposed approach on Synthetic Sound Scenes dataset. Experimental results show that our proposed approach outperforms the state-of-the-art approaches. Compared with the referenced approach, our approach improves the segment-based F-score by 4.29% and reduces the segment-based error rate by 4.8%.