MIREX 2015 : METHODS FOR SPEECH / MUSIC DETECTION AND CLASSIFICATION

With this submission, a set of ensemble learning based methods for the MIREX 2015 Speech / Music Classification and Detection task is proposed and evaluated. The main algorithm for the Detection task employs a self similarity matrix analysis technique to detect homogeneous segments of audio that can be subsequently classified as music or speech by a Random Forest classifier. In addition to the main algorithm two variations are proposed, the first one employs a silence detection algorithm while the second one omits the self-similarity information and relies solely on the Random Forest classifier. For the Classification task two variants are proposed, both based on a sliding-window classification approach. In the first case a pre-trained model is used, while in the second case, a training phase exploiting training data provided during the submission evaluation, precedes classification.

[1]  Charalampos Dimoulas,et al.  Augmenting Social Multimedia Semantic Interaction through Audio-Enhanced Web-TV Services , 2015, AM '15.

[2]  Charalampos Dimoulas,et al.  Content-Based Music Structure Analysis Using Vector Quantization , 2015 .

[3]  Dirk Merkel,et al.  Docker: lightweight Linux containers for consistent development and deployment , 2014 .

[4]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  John Saunders,et al.  Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[6]  Charalampos Dimoulas,et al.  Mobile Audio Intelligence: From Real Time Segmentation to Crowd Sourced Semantics , 2015, AM '15.

[7]  Rigas Kotsakis,et al.  Investigation of broadcast-audio semantic analysis scenarios employing radio-programme-adaptive pattern classification , 2012, Speech Commun..

[8]  Jonathan Foote,et al.  Automatic audio segmentation using a measure of audio novelty , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).