DATA AUGMENTATION AND PRIOR KNOWLEDGE-BASED REGULARIZATION FOR SOUND EVENT LOCALIZATION AND DETECTION Technical Report

The goal of sound event localization and detection (SELD) is detecting the presence of polyphonic sound events and identifying the sources of those events at the same time. In this paper, we propose an entire pipeline, which contains data augmentation, network prediction and post-processing stage, to deal with the SELD task. In data augmentation part, we expand the official dataset with SpecAugment [1]. In network prediction part, we train the event detection network and the localization network separately, and utilize the prediction of events to output localization prediction for active frames. In post-processing part, we propose a prior knowledgebased regularization (PKR), which calculates the average value of the localization prediction of each event segment and replace the prediction of this event with this average value. We theoretically prove that this technique could reduce mean square error (MSE). After evaluating our system on DCASE 2019 Challenge Task 3 Development Dataset, we approximately achieve a 59% reduction in SED error rate (ER) and a 13% reduction in directions-of-arrival (DOA) error over the baseline system (on Ambisonic dataset).