A punishment voting algorithm based on super categories construction for acoustic scene classification

In acoustic scene classification researches, audio segment is usually split into multiple samples. Majority voting is then utilized to ensemble the results of the samples. In this paper, we propose a punishment voting algorithm based on the super categories construction method for acoustic scene classification. Specifically, we propose a DenseNet-like model as the base classifier. The base classifier is trained by the CQT spectrograms generated from the raw audio segments. Taking advantage of the results of the base classifier, we propose a super categories construction method using the spectral clustering. Super classifiers corresponding to the constructed super categories are further trained. Finally, the super classifiers are utilized to enhance the majority voting of the base classifier by punishment voting. Experiments show that the punishment voting obviously improves the performances on both the DCASE2017 Development dataset and the LITIS Rouen dataset.

[1]  Nikos Fakotakis,et al.  Probabilistic Novelty Detection for Acoustic Surveillance Under Real-World Conditions , 2011, IEEE Transactions on Multimedia.

[2]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[3]  Hanseok Ko,et al.  A Novel Discriminative Feature Extraction for Acoustic Scene Classification Using RNN Based Source Separation , 2017, IEICE Trans. Inf. Syst..

[4]  Björn W. Schuller,et al.  Acoustic Geo-Sensing: Recognising cyclists' route, route direction, and route progress from cell-phone audio , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  C. L. Philip Chen,et al.  A survey of human-centered intelligent robots: issues and challenges , 2017, IEEE/CAA Journal of Automatica Sinica.

[7]  L. Penrose The Elementary Statistics of Majority Voting , 1946 .

[8]  Daniele Battaglino,et al.  Acoustic scene classification using convolutional neural networks , 2016 .

[9]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Soo-Don Hyun,et al.  ACOUSTIC SCENE CLASSIFICATION USING PARALLEL COMBINATION OF LSTM AND CNN , 2016 .

[11]  Samarjit Das,et al.  Acoustic Scene Recognition with Deep Neural Networks ( DCASE challenge 2016 ) , 2016 .

[12]  Padmanabhan Rajan,et al.  ACOUSTIC SCENE CLASSIFICATION USING DEEP LEARNING , 2016 .

[13]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[14]  Thomas Lidy,et al.  CQT-based Convolutional Neural Networks for Audio Scene Classification , 2016, DCASE.

[15]  Christian Schörkhuber CONSTANT-Q TRANSFORM TOOLBOX FOR MUSIC PROCESSING , 2010 .