HCMUS team at the Multimodal Person Discovery in Broadcast TV Task of MediaEval 2016

We present the method of the HCMUS team participating in Multimodal Person Discovery in Broadcast TV Task at the MediaEval Challenge 2016. There are two main processes in our method. First we identify a list of potential characters of interest from all video clips. Each potential character is defined as a pair of face track, a sequence of face patches, and a name. We use OCR results and face detection to find potential characters. We also apply several simple techniques to check the consistency of linking a name with a face track to reduce potential wrong matching pairs. Then we detect face patches from test video shots with cascade DPM, extract deep features from face patches using a very deep Convolutional Neural Network, and classify faces using SVM.

[1]  Claude Barras,et al.  Multimodal Person Discovery in Broadcast TV at MediaEval 2016 , 2015, MediaEval.

[2]  Luc Van Gool,et al.  Face Detection without Bells and Whistles , 2014, ECCV.

[3]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[4]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[5]  Andrew Zisserman,et al.  Deep Face Recognition , 2015, BMVC.