论文信息 - An Effortless Way To Create Large-Scale Datasets For Famous Speakers

An Effortless Way To Create Large-Scale Datasets For Famous Speakers

The creation of large-scale multimedia datasets has become a scientific matter in itself. Indeed, the fully-manual annotation of hundreds or thousands of hours of video and/or audio turns out to be practically infeasible. In this paper, we propose an extremly handy approach to automatically construct a database of famous speakers from TV broadcast news material. We then run a user experiment with a correctly designed tool that demonstrates that very reliable results can be obtained with this method. In particular, a thorough error analysis demonstrates the value of the approach and provides hints for the improvement of the quality of the dataset.

Félicien Vallet | François Salmon

[1] Björn W. Schuller,et al. Speech overlap detection using convolutive non-negative sparse coding: New improvements and insights , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[2] Guillaume Gravier,et al. The ester 2 evaluation campaign for the rich transcription of French radio broadcasts , 2009, INTERSPEECH.

[3] Georges Quénot,et al. From Text Detection in Videos to Person Identification , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[4] Marijn Huijbregts,et al. Towards automatic speaker retrieval for large multimedia archives , 2010, AIEMPro '10.

[5] Woojay Jeon,et al. Statistical Utterance Comparison for Speaker Clustering Using Factor Analysis , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[6] Harald Höge,et al. SPEECON - Speech Data for Consumer Devices , 2000, LREC.

[7] Haizhou Li,et al. An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[8] Yves Raimond,et al. Automated interlinking of speech radio archives , 2012, LDOW.

[9] Olivier Galibert,et al. The ETAPE corpus for the evaluation of speech-based TV content processing in the French language , 2012, LREC.

[10] Olivier Galibert,et al. REPERE : premiers résultats d'un défi autour de la reconnaissance multimodale des personnes (REPERE : preliminary results of a multimodal person recognition challenge) [in French] , 2012, JEP-TALN-RECITAL 2012.

[11] Woojay Jeon,et al. Efficient speaker search over large populations using kernelized locality-sensitive hashing , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12] Sylvain Meignier,et al. LIUM SPKDIARIZATION: AN OPEN SOURCE TOOLKIT FOR DIARIZATION , 2010 .