论文信息 - ACOUSTIC SCENE CLASSIFICATION BASED ON BINAURAL DEEP SCATTERING SPECTRA WITH CNN AND LSTM Technical Report

ACOUSTIC SCENE CLASSIFICATION BASED ON BINAURAL DEEP SCATTERING SPECTRA WITH CNN AND LSTM Technical Report

This technical report presents the solutions proposed by the Beijing Institute of Technology Modern Communications Technology Laboratory for the acoustic scene classification of DCASE2018 task1a. Compared to previous years, the data is more diverse, making such tasks more difficult. In order to solve this problem, we use the Deep Scattering Spectra (DSS) features. The traditional features, such as Mel-frequency Cepstral Coefficients (MFCC), often lose information at high frequencies. DSS is a good way to preserve high frequency information. Based on this feature, we propose a network model of Convolutional Neural Network (CNN) and Long Short-term Memory (LSTM) to classify sound scenes. The experimental results show that the proposed feature extraction method and network structure have a good effect on this classification task. From the experimental data, the accuracy increased from 59% to 76%.

[1] Joakim Andén,et al. Joint time-frequency scattering for audio classification , 2015, 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP).

[2] Tara N. Sainath,et al. Deep scattering spectra with deep neural networks for LVCSR tasks , 2014, INTERSPEECH.

[3] Luc Van Gool,et al. Deep Convolutional Neural Networks and Data Augmentation for Acoustic Event Detection , 2016, ArXiv.

[4] Joakim Andén,et al. Deep Scattering Spectrum , 2013, IEEE Transactions on Signal Processing.

[5] Hanseok Ko,et al. Deep Neural Network Bottleneck Features for Acoustic Event Recognition , 2016, INTERSPEECH.

[6] John H. L. Hansen,et al. Acoustic Scene Classification Using a CNN-SuperVector System Trained with Auditory and Spectrogram Image Features , 2017, INTERSPEECH.