A Re-trained Model Based On Multi-kernel Convolutional Neural Network for Acoustic Scene Classification

This paper proposes a deep learning framework applied for Acoustic Scene Classification (ASC), which identifies recording location. In general, we apply three types of spectrograms: Gammatone (GAM), log-Mel and Constant Q Transform (CQT) for front-end feature extraction. For back-end classification, we present a re-trained model with a multi-kernel CDNN-based architecture for the pre-trained process and a DNN-based network for the post-trained process. Our obtained results over DCASE 2016 dataset show a significant improvement, increasing by nearly 8% compared to DCASE baseline of 77.2%.

[1]  J Self Human and machine learning , 1987 .

[2]  Hanseok Ko,et al.  Score Fusion of Classification Systems for Acoustic Scene Classification , 2016 .

[3]  Goutam Saha,et al.  Wavelet Transform Based Mel-scaled Features for Acoustic Scene Classification , 2018, INTERSPEECH.

[4]  Soo-Don Hyun,et al.  ACOUSTIC SCENE CLASSIFICATION USING PARALLEL COMBINATION OF LSTM AND CNN , 2016 .

[5]  John H. L. Hansen,et al.  Acoustic Scene Classification Using a CNN-SuperVector System Trained with Auditory and Spectrogram Image Features , 2017, INTERSPEECH.

[6]  Thomas Lidy,et al.  CQT-based Convolutional Neural Networks for Audio Scene Classification , 2016, DCASE.

[7]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[8]  Mark D. Plumbley,et al.  Hierarchical Learning for DNN-Based Acoustic Scene Classification , 2016 .

[9]  Hongwei Song,et al.  A Compact and Discriminative Feature Based on Auditory Summary Statistics for Acoustic Scene Classification , 2018, INTERSPEECH.

[10]  Jaehun Kim EMPIRICAL STUDY ON ENSEMBLE METHOD OF DEEP NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION , 2022 .

[11]  S. Squartini,et al.  DCASE 2016 Acoustic Scene Classification Using Convolutional Neural Networks , 2016, DCASE.

[12]  Björn Schuller,et al.  THE UP SYSTEM FOR THE 2016 DCASE CHALLENGE USING DEEP RECURRENT NEURAL NETWORK AND MULTISCALE KERNEL SUBSPACE LEARNING , 2016 .

[13]  Bhiksha Raj,et al.  Experiments on the DCASE Challenge 2016: Acoustic Scene Classification and Sound Event Detection in Real Life Recording , 2016, DCASE.

[14]  Gerhard Widmer,et al.  CP-JKU SUBMISSIONS FOR DCASE-2016 : A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS , 2016 .

[15]  Nobutaka Ono,et al.  ACOUSTIC SCENE CLASSIFICATION USING DEEP NEURAL NETWORK AND FRAME-CONCATENATED ACOUSTIC FEATURE , 2016 .

[16]  Kyogu Lee,et al.  CONVOLUTIONAL NEURAL NETWORK WITH MULTIPLE-WIDTH FREQUENCY-DELTA DATA AUGMENTATION FOR ACOUSTIC SCENE CLASSIFICATION , 2016 .

[17]  Patrick Pérez,et al.  Acoustic Scene Classification: An evaluation of an extremely compact feature representation , 2016, DCASE.

[18]  Roger Zimmermann,et al.  Learning and Fusing Multimodal Deep Features for Acoustic Scene Categorization , 2018, ACM Multimedia.

[19]  Florian Metze,et al.  A comparison of Deep Learning methods for environmental sound detection , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[20]  S. Essid,et al.  SUPERVISED NONNEGATIVE MATRIX FACTORIZATION FOR ACOUSTIC SCENE CLASSIFICATION , 2016 .

[21]  Huy Phan,et al.  Improved Audio Scene Classification Based on Label-Tree Embeddings and Convolutional Neural Networks , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22]  Tuomas Virtanen,et al.  TUT database for acoustic scene classification and sound event detection , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[23]  Colin Raffel,et al.  librosa: Audio and Music Signal Analysis in Python , 2015, SciPy.

[24]  Franz Pernkopf,et al.  Gated Recurrent Networks applied to Acoustic Scene Classification , 2016, DCASE.