Acoustic scene classification using self-determination convolutional neural network

A practical acoustic scene classification (ASC) system identifies the scene of environment by analyzing the properties of audio data that characterize the surroundings. Proposing a robust ASC system is challenging as the sound from natural environment compromises various audio sources and the microphones are not arranged in a controlled condition. Furthermore, not all sounds from long-duration audio data are relevant for identifying scene label. Some sounds may contribute to the noises or the mix of various scene labels. In this work, the problem of mixed information from various sources in natural environment is addressed by proposing a classification system that adapts to the data for each frames of long-duration audio data. The proposed ASC system is designed based on deep learning based approach, and is motivated by the researches in computer vision field. Our method is evaluated on the TUT acoustic scenes 2016 dataset. Several ASC systems have been implemented and discussed in the experiments. The results show the superiority of proposed system versus another systems that have been discussed in this work.

[1]  Gaël Richard,et al.  Acoustic scene classification with matrix factorization for unsupervised feature learning , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[3]  Shrikanth Narayanan,et al.  Environmental Sound Recognition With Time–Frequency Audio Features , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[5]  Andrew Zisserman,et al.  Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[7]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Gaël Richard,et al.  HOG and subband power distribution image features for acoustic scene classification , 2015, 2015 23rd European Signal Processing Conference (EUSIPCO).

[9]  Geoffrey E. Hinton,et al.  Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[10]  Laurent Itti,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 Rapid Biologically-inspired Scene Classification Using Features Shared with Visual Attention , 2022 .

[11]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Tuomas Virtanen,et al.  TUT database for acoustic scene classification and sound event detection , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).