论文信息 - CLASSIFYING SHORT ACOUSTIC SCENES WITH I-VECTORS AND CNNS : CHALLENGES AND OPTIMISATIONS FOR THE 2017 DCASE ASC TASK

CLASSIFYING SHORT ACOUSTIC SCENES WITH I-VECTORS AND CNNS : CHALLENGES AND OPTIMISATIONS FOR THE 2017 DCASE ASC TASK

This report describes the CP-JKU team’s submissions for Task 1 (Acoustic Scene Classification, ASC) of the DCASE-2017 challenge, and discusses some observations we made about the data and the classification setup. Our approach is based on the methodology that achieved ranks 1 and 2 in the 2016 ASC challenge: a fusion of i-vector modelling using MFCC features derived from left and right audio channels, and deep convolutional neural networks (CNNs) trained on raw spectrograms. The data provided for the 2017 ASC task presented some new challenges – in particular, audio stimuli of very short duration. These will be discussed in detail, and our measures for addressing them will be described. The result of our experiments is a classification system that achieves classification accuracies of around 90% on the provided development data, as estimated via the prescribed four-fold cross-validation scheme. On the unseen evaluation data, our best performing method achieved 73.8% and 5 place in the team ranking.

[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2] Ankit Shah,et al. DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System , 2017, DCASE.

[3] Gerhard Widmer,et al. CP-JKU SUBMISSIONS FOR DCASE-2016 : A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS , 2016 .

[4] Qiang Chen,et al. Network In Network , 2013, ICLR.

[5] Sridha Sridharan,et al. i-vector Based Speaker Recognition on Short Utterances , 2011, INTERSPEECH.

[6] Sepp Hochreiter,et al. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[7] Gerhard Widmer,et al. A hybrid approach with multi-channel i-vectors and convolutional neural networks for acoustic scene classification , 2017, 2017 25th European Signal Processing Conference (EUSIPCO).

[8] Andreas Stolcke,et al. Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[9] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.