Acoustic Scene Classification Using Deep Audio Feature and BLSTM Network

Although acoustic scene classification has been received great attention from researchers in the field of audio signal processing, it is still a challenging and unsolved task to date. In this paper, we present our work of acoustic scene classification for the challenge of the Detection and Classification of Acoustic Scenes and Events 2017, i.e., DCASE2017 challenge, using a feature of Deep Audio Feature (DAF) for acoustic scene representation and a classifier of Bidirectional Long Short Term Memory (BLSTM) network for acoustic scene classification. We first use a deep neural network to generate the DAF from Mel frequency cepstral coefficients, and then adopt a network of BLSTM fed by the DAF for acoustic scene classification. When evaluated on the official datasets of the DCASE2017 challenge, the proposed system outperforms the baseline system in terms of classification accuracy.

[1]  Dong Yu,et al.  Improved Bottleneck Features Using Pretrained Deep Neural Networks , 2011, INTERSPEECH.

[2]  Mathieu Lagrange,et al.  Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016), Budapest, Hungary, 3 Sep 2016. , 2016 .

[3]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[4]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[6]  Vittorio Murino,et al.  Audio Surveillance , 2014, ACM Comput. Surv..

[7]  Tao Li,et al.  Characteristics-based effective applause detection for meeting speech , 2009, Signal Process..

[8]  Birger Kollmeier,et al.  On the use of spectro-temporal features for the IEEE AASP challenge ‘detection and classification of acoustic scenes and events’ , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[9]  Gerhard Widmer,et al.  CP-JKU SUBMISSIONS FOR DCASE-2016 : A HYBRID APPROACH USING BINAURAL I-VECTORS AND DEEP CONVOLUTIONAL NEURAL NETWORKS , 2016 .

[10]  Gaël Richard,et al.  Acoustic scene classification with matrix factorization for unsupervised feature learning , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[11]  Wei Li,et al.  Unsupervised classification of speaker roles in multi-participant conversational speech , 2017, Comput. Speech Lang..

[12]  Qian Huang,et al.  Using multi-stream hierarchical deep neural network to extract deep audio feature for acoustic event detection , 2016, Multimedia Tools and Applications.

[13]  Alain Rakotomamonjy,et al.  Histogram of gradients of Time-Frequency Representations for Audio scene detection , 2015, ArXiv.

[14]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[15]  Birger Kollmeier,et al.  Classifier Architectures for Acoustic Scenes and Events: Implications for DNNs, TDNNs, and Perceptual Features from DCASE 2016 , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[16]  Qian Huang,et al.  Unsupervised detection of acoustic events using information bottleneck principle , 2017, Digit. Signal Process..

[17]  Daniel P. W. Ellis,et al.  Proceedings of the Detection and Classification of Acoustic Scenes and Events 2016 Workshop (DCASE2016) , 2016 .

[18]  Xinxing Li,et al.  A deep bidirectional long short-term memory based multi-scale approach for music dynamic emotion prediction , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[19]  Andrey Temko,et al.  CLEAR Evaluation of Acoustic Event Detection and Classification Systems , 2006, CLEAR.

[20]  Kun Li,et al.  Voice conversion using deep Bidirectional Long Short-Term Memory based Recurrent Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Tuomas Virtanen,et al.  TUT database for acoustic scene classification and sound event detection , 2016, 2016 24th European Signal Processing Conference (EUSIPCO).

[22]  Nicolai Petkov,et al.  Audio Surveillance of Roads: A System for Detecting Anomalous Sounds , 2016, IEEE Transactions on Intelligent Transportation Systems.

[23]  Ankit Shah,et al.  DCASE2017 Challenge Setup: Tasks, Datasets and Baseline System , 2017, DCASE.

[24]  S. Squartini,et al.  DCASE 2016 Acoustic Scene Classification Using Convolutional Neural Networks , 2016, DCASE.

[25]  Huy Phan,et al.  Random Regression Forests for Acoustic Event Detection and Classification , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.