Inf@TRECVID 2019: Instance Search Task

We participated in one of the two types of Instance Search task in TRECVID 2019: Fully Automatic Search, without any human intervention. Firstly, the specific person and action are searched separately, and then we re-rank the two sorts of search results by ranking the one type scores according to the other type, as well as the score fusion. And thus, three kinds of final instance search results are submitted. Specifically, for the person search, our baseline consists of face detection, alignment and face feature selection. And for the action search, we integrate person detection, person tracking and feature selection into a framework to get the final 3D features for all tracklets in video shots. The official evaluations showed that our best search result gets the 4th place in the Automatic search.

[1]  Omkar M. Parkhi,et al.  VGGFace2: A Dataset for Recognising Faces across Pose and Age , 2017, 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018).

[2]  Dietrich Paulus,et al.  Simple online and realtime tracking with a deep association metric , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[3]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Xiaojun Chang,et al.  Adaptive Semi-Supervised Feature Selection for Cross-Modal Retrieval , 2019, IEEE Transactions on Multimedia.

[5]  Ali Farhadi,et al.  Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.

[6]  Andrew Zisserman,et al.  Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Paul Over,et al.  Instance search retrospective with focus on TRECVID , 2017, International Journal of Multimedia Information Retrieval.

[8]  Yu Qiao,et al.  Joint Face Detection and Alignment Using Multitask Cascaded Convolutional Networks , 2016, IEEE Signal Processing Letters.

[9]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[10]  Jonathan G. Fiscus,et al.  TRECVID 2019: An evaluation campaign to benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & retrieval , 2019, TRECVID.