P-S Instance Retrieval via Early Elimination and Late Expansion

In daily life, it is common that viewers want to quickly browse scenes with their idols in TV series. In 2016, the TRECVID INS (Instance Search) task started to focus on identifying a specific target person in a target location. In this paper, we name this kind of task as P-S (Person-Scene) Instance Retrieval. As we know, most approaches handle this task by separately obtaining the person instance and the scene instance retrieval results, and directly combining them together. However, we find that the person and scene instance retrieval modules are not always effective at the same time, which will decrease the accuracy if the results are aggregated directly. To solve this problem, we attempt to achieve the results in two steps. (1) Early Elimination. There are many noisy data making person/scene instance retrieval score solely high, such as the occluded person or scene shots. Corresponding scores of these shots should be eliminated rather than calculated with noise. (2) Late Expansion. Considering the video»s continuity, person or scene in adjacent shots is likely to be the same one, hence we try to expand the results of those eliminated shots. On this basis, we propose an early elimination and late expansion method to improve the accuracy of P-S Instance Retrieval. Experimental results on the large-scale TRECVID INS dataset demonstrate the effectiveness of the proposed method.

[1]  Wei Liu,et al.  BUPT-MCPRL at TRECVID 2012 , 2010, TRECVID.

[2]  Noel E. O'Connor,et al.  Bags of Local Convolutional Features for Scalable Instance Search , 2016, ICMR.

[3]  Fabien Moutarde,et al.  Person re-identification in multi-camera system by signature based on interest point descriptors collected on short video sequences , 2008, 2008 Second ACM/IEEE International Conference on Distributed Smart Cameras.

[4]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Guillaume Chanel,et al.  Synchronization among Groups of Spectators for Highlight Detection in Movies , 2016, ACM Multimedia.

[6]  Yiannis Kompatsiaris,et al.  ITI-CERTH participation in TRECVID 2018 , 2017, TRECVID.

[7]  Zhang Wen,et al.  PKU_ICST at TRECVID 2018: Instance Search Task , 2013, TRECVID.

[8]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Gerard Salton,et al.  Information Retrieval , 2018, Encyclopedia of Social Network Analysis and Mining. 2nd Ed..

[11]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[12]  Zheng Wang,et al.  Zero-Shot Person Re-identification via Cross-View Consistency , 2016, IEEE Transactions on Multimedia.

[13]  Tarak Gandhi,et al.  Person tracking and reidentification: Introducing Panoramic Appearance Map (PAM) for feature representation , 2006, Machine Vision and Applications.

[14]  Linjie Xing,et al.  Shenzhen Institutes of Advanced Technology, CAS, China at TRECVID INS 2016 , 2016, TRECVID.

[15]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[16]  Jian Sun,et al.  Face Alignment at 3000 FPS via Regressing Local Binary Features , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Shigeki Aoki,et al.  Scene recognition based on relationship between human actions and objects , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[18]  Zheng Wang,et al.  Scale-Adaptive Low-Resolution Person Re-Identification via Learning a Discriminating Surface , 2016, IJCAI.

[19]  Zheng Wang,et al.  Person Reidentification via Ranking Aggregation of Similarity Pulling and Dissimilarity Pushing , 2016, IEEE Transactions on Multimedia.

[20]  Luis Herranz,et al.  Scene Recognition with CNNs: Objects, Scales and Dataset Bias , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Duy-Dinh Le,et al.  NII-HITACHI-UIT at TRECVID 2017 , 2016, TRECVID.

[22]  Hanqing Lu,et al.  Scale-Adaptive Deconvolutional Regression Network for Pedestrian Detection , 2016, ACCV.

[23]  Richard I. Hartley,et al.  Person Reidentification Using Spatiotemporal Appearance , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Rainer Stiefelhagen,et al.  Person re-identification in TV series using robust face recognition and user feedback , 2011, Multimedia Tools and Applications.

[25]  Masakazu Iwamura,et al.  Fast Instance Search Based on Approximate Bichromatic Reverse Nearest Neighbor Search , 2014, ACM Multimedia.

[26]  Bolei Zhou,et al.  Learning Deep Features for Scene Recognition using Places Database , 2014, NIPS.

[27]  Michael Granitzer,et al.  Hypervideo Production Using Crowdsourced Youtube Videos , 2016, ACM Multimedia.

[28]  Zheng Wang,et al.  Region-Based Interactive Ranking Optimization for Person Re-identification , 2014, PCM.

[29]  Jonathan G. Fiscus,et al.  TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking , 2016, TRECVID.

[30]  Ioannis Patras,et al.  Iti - Certh In Trecvid 2016 Ad - Hoc Video Search (Avs) , 2016 .

[31]  Chao Liang,et al.  WHU-NERCMS at TRECVID2016: Instance Search Task , 2016, TRECVID.

[32]  Yue Gao,et al.  Multi-View 3D Object Retrieval With Deep Embedding Network , 2016, IEEE Transactions on Image Processing.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Qi Tian,et al.  Query-adaptive late fusion for image search and person re-identification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).