论文信息 - Time-Frequency Feature Decomposition Based on Sound Duration for Acoustic Scene Classification

Time-Frequency Feature Decomposition Based on Sound Duration for Acoustic Scene Classification

Acoustic scene classification is the task of identifying the type of acoustic environment in which a given audio signal is recorded. The signal is a mixture of sound events with various characteristics. In-depth and focused analysis is needed to find out the most representative sound patterns for recognizing and differentiating the scenes. In this paper, we propose a feature decomposition method based on temporal median filtering, and use convolutional neural network to model long-duration background sounds and transient sounds separately. Experiments on log-mel and wavelet based time-frequency features show that using the proposed method leads to better classification accuracy. Analysis of detailed experimental results reveals that (1) long-duration sounds are generally most informative for acoustic scene classification; and (2) the focus of sound duration may be different for classifying different types of acoustic scenes.

Tan Lee | Yuzhong Wu

[1] Daniel P. W. Ellis,et al. Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016) , 2016 .

[2] J. Jorgenson,et al. Median filtering for removal of low-frequency background drift. , 1993, Analytical chemistry.

[3] Hongyi Zhang,et al. mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[4] Yonghong Yan,et al. An Audio Scene Classification Framework with Embedded Filters and a DCT-based Temporal Module , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5] Joakim Andén,et al. Kymatio: Scattering Transforms in Python , 2018, J. Mach. Learn. Res..

[6] Jonathan Huang,et al. Acoustic Scene Classification Using Deep Learning-based Ensemble Averaging , 2019, DCASE.

[7] Tuomas Virtanen,et al. A multi-device dataset for urban acoustic scene classification , 2018, DCASE.

[8] Anssi Klapuri,et al. Recognition of Everyday Auditory Scenes: Potentials, Latencies and Cues , 2001 .

[9] Derry Fitzgerald,et al. Harmonic/Percussive Separation Using Median Filtering , 2010 .

[10] Kyogu Lee,et al. Convolutional Neural Networks with Binaural Representations and Background Subtraction for Acoustic Scene Classification , 2017, DCASE.

[11] Yonghong Yan,et al. Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling , 2019, ArXiv.

[12] Björn Schuller,et al. Deep Sequential Image Features on Acoustic Scene Classification , 2017, DCASE.

[13] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14] Judith C. Brown. Calculation of a constant Q spectral transform , 1991 .

[15] Tan Lee,et al. Enhancing Sound Texture in CNN-based Acoustic Scene Classification , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).