An Effortless Way To Create Large-Scale Datasets For Famous Speakers

The creation of large-scale multimedia datasets has become a scientific matter in itself. Indeed, the fully-manual annotation of hundreds or thousands of hours of video and/or audio turns out to be practically infeasible. In this paper, we propose an extremly handy approach to automatically construct a database of famous speakers from TV broadcast news material. We then run a user experiment with a correctly designed tool that demonstrates that very reliable results can be obtained with this method. In particular, a thorough error analysis demonstrates the value of the approach and provides hints for the improvement of the quality of the dataset.

[1]  Björn W. Schuller,et al.  Speech overlap detection using convolutive non-negative sparse coding: New improvements and insights , 2012, 2012 Proceedings of the 20th European Signal Processing Conference (EUSIPCO).

[2]  Guillaume Gravier,et al.  The ester 2 evaluation campaign for the rich transcription of French radio broadcasts , 2009, INTERSPEECH.

[3]  Georges Quénot,et al.  From Text Detection in Videos to Person Identification , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[4]  Marijn Huijbregts,et al.  Towards automatic speaker retrieval for large multimedia archives , 2010, AIEMPro '10.

[5]  Woojay Jeon,et al.  Statistical Utterance Comparison for Speaker Clustering Using Factor Analysis , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Harald Höge,et al.  SPEECON - Speech Data for Consumer Devices , 2000, LREC.

[7]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[8]  Yves Raimond,et al.  Automated interlinking of speech radio archives , 2012, LDOW.

[9]  Olivier Galibert,et al.  The ETAPE corpus for the evaluation of speech-based TV content processing in the French language , 2012, LREC.

[10]  Olivier Galibert,et al.  REPERE : premiers résultats d'un défi autour de la reconnaissance multimodale des personnes (REPERE : preliminary results of a multimodal person recognition challenge) [in French] , 2012, JEP-TALN-RECITAL 2012.

[11]  Woojay Jeon,et al.  Efficient speaker search over large populations using kernelized locality-sensitive hashing , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Sylvain Meignier,et al.  LIUM SPKDIARIZATION: AN OPEN SOURCE TOOLKIT FOR DIARIZATION , 2010 .