Automatic propagation of manual annotations for multimodal person identification in TV shows

In this paper an approach to human annotation propagation for person identification in the multimodal context is proposed. A system is used, which combines speaker diarization and face clustering to produce multimodal clusters. The whole multimodal clusters are later annotated rather than just single tracks, which is done by propagation. Optical character recognition systems provides initial annotation. Four different strategies, which select candidates for annotation, are tested. The initial results of annotation propagation are promising. With the use of a proper active learning selection strategy the human annotator involvement could be reduced even further.

[1]  Ioannis Pratikakis,et al.  A two-stage scheme for text detection in video images , 2010, Image Vis. Comput..

[2]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[3]  Olivier Galibert,et al.  The REPERE Corpus : a multimodal corpus for person recognition , 2012, LREC.

[4]  Rainer Stiefelhagen,et al.  Multi-pose Face Recognition for Person Retrieval in Camera Networks , 2010, 2010 7th IEEE International Conference on Advanced Video and Signal Based Surveillance.

[5]  Cordelia Schmid,et al.  Face recognition from caption-based supervision , 2010 .

[6]  Georges Quénot,et al.  Active learning with multiple classifiers for multimedia indexing , 2010, 2010 International Workshop on Content Based Multimedia Indexing (CBMI).

[7]  Wei Hu,et al.  Unsupervised Active Learning Based on Hierarchical Graph-Theoretic Clustering , 2009, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8]  Claudia Bauzer Medeiros,et al.  Multimedia Semantic Annotation Propagation , 2008, 2008 Tenth IEEE International Symposium on Multimedia.

[9]  Johann Poignant,et al.  Identification non-supervisée de personnes dans les flux télévisés. (Unsupervised person recognition in TV broadcast) , 2013 .

[10]  Stéphane Ayache,et al.  Evaluation of active learning strategies for video indexing , 2007, Signal Process. Image Commun..

[11]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[12]  Georges Quénot,et al.  From Text Detection in Videos to Person Identification , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[13]  Arnold W. M. Smeulders,et al.  Active learning using pre-clustering , 2004, ICML.

[14]  Marie-Francine Moens,et al.  Naming People in News Videos with Label Propagation , 2011, IEEE MultiMedia.

[15]  Georges Quénot,et al.  Towards a Better Integration of Written Names for Unsupervised Speakers Identification in Videos , 2013, SLAM@INTERSPEECH.