论文信息 - GTM-UVigo System for Multimodal Person Discovery in Broadcast TV Task at MediaEval 2016

GTM-UVigo System for Multimodal Person Discovery in Broadcast TV Task at MediaEval 2016

In this paper, we present the system developed by GTMUVigo team for the Multimedia Person Discovery in Broadcast TV task at MediaEval 2016. The proposed approach consists in a novel strategy for person discovery which is not based on speaker and face diarisation as in previous works. In this system, the task is approached as a person recognition problem: there is an enrolment stage, where the voice and face of each discovered person are detected and, for each shot, the most suitable voice and face are assigned using the i-vector paradigm. These two biometric modalities are combined by decision fusion.

Carmen García-Mateo | Paula Lopez-Otero | Laura Docío Fernández

[1] Michael Felsberg,et al. Accurate Scale Estimation for Robust Visual Tracking , 2014, BMVC.

[2] Sébastien Marcel,et al. Parts-Based Face Verification Using Local Frequency Bands , 2009, ICB.

[3] Daniel Povey,et al. The Kaldi Speech Recognition Toolkit , 2011 .

[4] Sébastien Marcel,et al. Bob: a free signal processing and machine learning toolbox for researchers , 2012, ACM Multimedia.

[5] Verónica Vilaplana,et al. UPC System for the 2015 MediaEval Multimodal Person Discovery in Broadcast TV task , 2015, MediaEval.

[6] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7] Georges Quénot,et al. From Text Detection in Videos to Person Identification , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[8] Claude Barras,et al. Multimodal Person Discovery in Broadcast TV at MediaEval 2016 , 2015, MediaEval.

[9] Koichi Shinoda,et al. Combining Audio Features and Visual I-Vector @ MediaEval 2015 Multimodal Person Discovery in Broadcast TV , 2015, MediaEval.

[10] Xiaoyang Tan,et al. Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions , 2007, IEEE Transactions on Image Processing.

[11] Jean-Marc Odobez,et al. EUMSSI team at the MediaEval Person Discovery Challenge , 2015, MediaEval.

[12] Patrick Kenny,et al. Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[13] Elisardo González-Agulla,et al. GTM-UVigo Systems for Person Discovery Task at MediaEval 2015 , 2015, MediaEval.