论文信息 - Large-scale audio feature extraction and SVM for acoustic scene classification

Large-scale audio feature extraction and SVM for acoustic scene classification

This work describes a system for acoustic scene classification using large-scale audio feature extraction. It is our contribution to the Scene Classification track of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (D-CASE). The system classifies 30 second long recordings of 10 different acoustic scenes. From the highly variable recordings, a large number of spectral, cepstral, energy and voicing-related audio features are extracted. Using a sliding window approach, classification is performed on short windows. SVM are used to classify these short segments, and a majority voting scheme is employed to get a decision for longer recordings. On the official development set of the challenge, an accuracy of 73 % is achieved. SVM are compared with a nearest neighbour classifier and an approach called Latent Perceptual Indexing, whereby SVM achieve the best results. A feature analysis using the t-statistic shows that mainly Mel spectra are the most relevant features.

Björn W. Schuller | Gerhard Rigoll | Jürgen T. Geiger | Björn Schuller | G. Rigoll

[1] Bo Xu,et al. SVM-based audio scene classification , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[2] Andrey Temko,et al. Classification of meeting-room acoustic events with support vector machines and variable-feature-set clustering , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3] Andrey Temko,et al. CLEAR Evaluation of Acoustic Event Detection and Classification Systems , 2006, CLEAR.

[4] Björn W. Schuller,et al. Acoustic Geo-Sensing: Recognising cyclists' route, route direction, and route progress from cell-phone audio , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5] Björn W. Schuller,et al. Learning New Acoustic Events in an HMM-Based System Using MAP Adaptation , 2011, INTERSPEECH.

[6] Ian H. Witten,et al. The WEKA data mining software: an update , 2009, SKDD.

[7] Björn Schuller,et al. Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[8] Dan Stowell,et al. Detection and classification of acoustic scenes and events: An IEEE AASP challenge , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[9] Vesa T. Peltonen,et al. Computational auditory scene recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10] Guy J. Brown,et al. Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[11] Shrikanth S. Narayanan,et al. Saliency-driven unstructured acoustic scene classification using latent perceptual indexing , 2009, 2009 IEEE International Workshop on Multimedia Signal Processing.

[12] Tsuhan Chen,et al. Audio Feature Extraction and Analysis for Scene Segmentation and Classification , 1998, J. VLSI Signal Process..