论文信息 - Comparison of different strategies for a SVM-based audio segmentation

Comparison of different strategies for a SVM-based audio segmentation

We compare in this paper diverse hierarchical and multi-class approaches for the speech/music segmentation task, based on Support Vector Machines, combined with a median filter post-processing. We show the effciency of kernel tuning through the novel Kernel Target Alignment criterion. Quantitative results provide an F-measure of 96.9%, that represents an error reduction of about 50% compared to the results gathered by the French ESTER evaluation campaign. We also show the relevance of the SVM with very low feature vector dimension on this task.

Gaël Richard | Mathieu Ramona | G. Richard | M. Ramona

[1] Lie Lu,et al. Content-based audio segmentation using support vector machines , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[2] John Saunders,et al. Real-time discrimination of broadcast speech/music , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[3] John Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[4] Daniel P. W. Ellis,et al. Speech/music discrimination based on posterior probability features , 1999, EUROSPEECH.

[5] Xavier Rodet,et al. HIERARCHICAL GAUSSIAN TREE WITH INERTIA RATIO MAXIMIZATION FOR THE CLASSIFICATION OF LARGE MUSICAL INSTRUMENT DATABASES , 2003 .

[6] Gaël Richard,et al. Combined Supervised and Unsupervised Approaches for Automatic Segmentation of Radiophonic Audio Streams , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7] Hideki Kawahara,et al. YIN, a fundamental frequency estimator for speech and music. , 2002, The Journal of the Acoustical Society of America.

[8] Guillaume Gravier,et al. The ESTER phase II evaluation campaign for the rich transcription of French broadcast news , 2005, INTERSPEECH.

[9] Robert Tibshirani,et al. Classification by Pairwise Coupling , 1997, NIPS.

[10] Dan Istrate,et al. Broadcast news speaker tracking for ESTER 2005 campaign , 2005, INTERSPEECH.

[11] Cédric Richard,et al. A greedy algorithm for optimizing the kernel alignment and the performance of kernel machines , 2006, 2006 14th European Signal Processing Conference.

[12] Nima Mesgarani,et al. Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[13] N. Cristianini,et al. On Kernel-Target Alignment , 2001, NIPS.