Large-scale audio feature extraction and SVM for acoustic scene classification

This work describes a system for acoustic scene classification using large-scale audio feature extraction. It is our contribution to the Scene Classification track of the IEEE AASP Challenge on Detection and Classification of Acoustic Scenes and Events (D-CASE). The system classifies 30 second long recordings of 10 different acoustic scenes. From the highly variable recordings, a large number of spectral, cepstral, energy and voicing-related audio features are extracted. Using a sliding window approach, classification is performed on short windows. SVM are used to classify these short segments, and a majority voting scheme is employed to get a decision for longer recordings. On the official development set of the challenge, an accuracy of 73 % is achieved. SVM are compared with a nearest neighbour classifier and an approach called Latent Perceptual Indexing, whereby SVM achieve the best results. A feature analysis using the t-statistic shows that mainly Mel spectra are the most relevant features.

[1]  Bo Xu,et al.  SVM-based audio scene classification , 2005, 2005 International Conference on Natural Language Processing and Knowledge Engineering.

[2]  Andrey Temko,et al.  Classification of meeting-room acoustic events with support vector machines and variable-feature-set clustering , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  Andrey Temko,et al.  CLEAR Evaluation of Acoustic Event Detection and Classification Systems , 2006, CLEAR.

[4]  Björn W. Schuller,et al.  Acoustic Geo-Sensing: Recognising cyclists' route, route direction, and route progress from cell-phone audio , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Björn W. Schuller,et al.  Learning New Acoustic Events in an HMM-Based System Using MAP Adaptation , 2011, INTERSPEECH.

[6]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[7]  Björn Schuller,et al.  Opensmile: the munich versatile and fast open-source audio feature extractor , 2010, ACM Multimedia.

[8]  Dan Stowell,et al.  Detection and classification of acoustic scenes and events: An IEEE AASP challenge , 2013, 2013 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics.

[9]  Vesa T. Peltonen,et al.  Computational auditory scene recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[10]  Guy J. Brown,et al.  Computational Auditory Scene Analysis: Principles, Algorithms, and Applications , 2006 .

[11]  Shrikanth S. Narayanan,et al.  Saliency-driven unstructured acoustic scene classification using latent perceptual indexing , 2009, 2009 IEEE International Workshop on Multimedia Signal Processing.

[12]  Tsuhan Chen,et al.  Audio Feature Extraction and Analysis for Scene Segmentation and Classification , 1998, J. VLSI Signal Process..