The Details That Matter: Frequency Resolution of Spectrograms in Acoustic Scene Classification

This study describes a convolutional neural network model submitted to the acoustic scene classification task of the DCASE 2017 challenge. The performance of this model is evaluated with different frequency resolutions of the input spectrogram showing that a higher number of mel bands improves accuracy with negligible impact on the learning time. Additionally, apart from the convolutional model focusing solely on the ambient characteristics of the audio scene, a proposed extension with pretrained event detectors shows potential for further exploration.

[1]  Petros Maragos,et al.  Improved Dictionary Selection and Detection Schemes in Sparse-CNMF-Based Overlapping Acoustic Event Detection , 2016, DCASE.

[2]  Daniele Battaglino,et al.  Acoustic scene classification using convolutional neural networks , 2016 .

[3]  Nobutaka Ono,et al.  ACOUSTIC SCENE CLASSIFICATION USING DEEP NEURAL NETWORK AND FRAME-CONCATENATED ACOUSTIC FEATURE , 2016 .

[4]  Kyogu Lee,et al.  CONVOLUTIONAL NEURAL NETWORK WITH MULTIPLE-WIDTH FREQUENCY-DELTA DATA AUGMENTATION FOR ACOUSTIC SCENE CLASSIFICATION , 2016 .

[5]  VirtanenTuomas,et al.  Detection and Classification of Acoustic Scenes and Events , 2018 .

[6]  Björn Schuller,et al.  RECOGNISING ACOUSTIC SCENES WITH LARGE-SCALE AUDIO FEATURE EXTRACTION AND SVM , 2013 .

[7]  Gerhard Widmer,et al.  CP-JKU SUBMISSIONS FOR DCASE-2016 : A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS , 2016 .

[8]  Ariel Habshush,et al.  IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events IEEE AASP SCENE CLASSIFICATION CHALLENGE USING HIDDEN MARKOV MODELS AND FRAME BASED CLASSIFICATION , 2013 .

[9]  Franz Pernkopf,et al.  Gated Recurrent Networks applied to Acoustic Scene Classification , 2016, DCASE.

[10]  Hanseok Ko,et al.  Deep Neural Network Bottleneck Feature for Acoustic Scene Classification , 2016 .

[11]  P. Herrera,et al.  RECURRENCE QUANTIFICATION ANALYSIS FEATURES FOR AUDITORY SCENE CLASSIFICATION , 2013 .

[12]  Alain Rakotomamonjy,et al.  Histogram of gradients of Time-Frequency Representations for Audio scene detection , 2015, ArXiv.

[13]  Daniel P. W. Ellis,et al.  Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016) , 2016 .

[14]  Ankit Shah,et al.  DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System , 2017, DCASE.

[15]  THE WONDERS OF THE NORMALIZED COMPRESSION DISSIMILARITY REPRESENTATION , 2013 .

[16]  Soo-Don Hyun,et al.  ACOUSTIC SCENE CLASSIFICATION USING PARALLEL COMBINATION OF LSTM AND CNN , 2016 .

[17]  S. Squartini,et al.  DCASE 2016 Acoustic Scene Classification Using Convolutional Neural Networks , 2016, DCASE.

[18]  Tuomas Virtanen,et al.  TUT database for acoustic scene classification and sound event detection , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[19]  Tomoki Toda,et al.  Bidirectional LSTM-HMM Hybrid System for Polyphonic Sound Event Detection , 2016, DCASE.

[20]  Patrick Pérez,et al.  Acoustic Scene Classification: An evaluation of an extremely compact feature representation , 2016, DCASE.

[21]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Justin Salamon,et al.  Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification , 2016, IEEE Signal Processing Letters.

[23]  Mark D. Plumbley,et al.  Coupled Sparse NMF vs. Random Forest Classification for Real Life Acoustic Event Detection , 2016, DCASE.

[24]  Mounya Elhilali,et al.  MULTIRESOLUTION AUDITORY REPRESENTATIONS FOR SCENE CLASS IFICATION , 2013 .

[25]  Thomas Lidy,et al.  CQT-based Convolutional Neural Networks for Audio Scene Classification , 2016, DCASE.

[26]  Gerald Friedland,et al.  AN I-VECTOR BASED APPROACH FOR AUDIO SCENE DETECTION , 2013 .

[27]  Huy Phan,et al.  CNN-LTE: a Class of 1-X Pooling Convolutional Neural Networks on Label Tree Embeddings for Audio Scene Recognition , 2016 .

[28]  Toan H. Vu,et al.  ACOUSTIC SCENE AND EVENT RECOGNITION USING RECURRENT NEURAL NETWORKS , 2016 .

[29]  Mark D. Plumbley,et al.  Hierarchical Learning for DNN-Based Acoustic Scene Classification , 2016 .

[30]  Toni Heittola,et al.  DOMESTIC AUDIO TAGGING WITH CONVOLUTIONAL NEURAL NETWORKS , 2016 .

[31]  Nam Soo Kim,et al.  DNN-BASED SOUND EVENT DETECTION WITH EXEMPLAR-BASED APPROACH FOR NOISE REDUCTION , 2016 .

[32]  Tuomas Virtanen,et al.  Sound Event Detection in Multichannel Audio Using Spatial and Harmonic Features , 2017, DCASE.