ACOUSTIC SCENE CLASSIFICATION BASED ON BINAURAL DEEP SCATTERING SPECTRA WITH CNN AND LSTM Technical Report

This technical report presents the solutions proposed by the Beijing Institute of Technology Modern Communications Technology Laboratory for the acoustic scene classification of DCASE2018 task1a. Compared to previous years, the data is more diverse, making such tasks more difficult. In order to solve this problem, we use the Deep Scattering Spectra (DSS) features. The traditional features, such as Mel-frequency Cepstral Coefficients (MFCC), often lose information at high frequencies. DSS is a good way to preserve high frequency information. Based on this feature, we propose a network model of Convolutional Neural Network (CNN) and Long Short-term Memory (LSTM) to classify sound scenes. The experimental results show that the proposed feature extraction method and network structure have a good effect on this classification task. From the experimental data, the accuracy increased from 59% to 76%.