论文信息 - CLASSIFICATION USING NETWORK-IN-NETWORK BASED CONVOLUTIONAL NEURAL NETWORK

CLASSIFICATION USING NETWORK-IN-NETWORK BASED CONVOLUTIONAL NEURAL NETWORK

In this paper, we present our entry to the challenge of detection and classification of acoustic scenes and events (DCASE). This paper describes the result of our proposed system for automatic audio scene classification task. Our approach is based on the deep learning method that is adopted from computer vision research field. The convolutional neural network is adopted to solve the problem of audio based scene classification, specifically the architecture of network-in-network is utilized to build the classifier. For the feature extraction part, mel frequency spectral coefficients (MFCC) is used as the input vector for the classifier. Differ from the original architecture of network-in-network, in this work we perform 1-D convolution operation instead of performing 2-D convolution. The classifier is trained using every frames from MFCC feature set, and the results for every frames are then thresholded and voted to choose the final scene label of audio data. The proposed work in this paper shows a better performance of the provided baseline system of DCASE challenge for both development and evaluation dataset.

Chien-Yao Wang | Andri Santoso | Jia-Ching Wang

[1] Ming Yang,et al. DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Pietro Perona,et al. A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3] Guy J. Brown,et al. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[4] Qiang Chen,et al. Network In Network , 2013, ICLR.

[5] Andrew Zisserman,et al. Scene Classification Using a Hybrid Generative/Discriminative Approach , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Tuomas Virtanen,et al. TUT database for acoustic scene classification and sound event detection , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[7] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .