Relation Understanding in Videos: A Grand Challenge Overview

ACM Multimedia 2019 Video Relation Understanding Challenge is the first grand challenge aiming at pushing video content analysis at the relational and structural level. This year, the challenge asks the participants to explore and develop innovative algorithms to detect object entities and their relations based on a large-scale user-generated video dataset. The tasks will advance the foundation of future visual systems that are able to perform complex inferences. This paper presents an overview of the grand challenge, including background, detailed descriptions of the three proposed tasks, the corresponding datasets for training, validation and testing, and the evaluation process.

[1]  Michael S. Bernstein,et al.  Visual Relationship Detection with Language Priors , 2016, ECCV.

[2]  Chong-Wah Ngo,et al.  Deep-based Ingredient Recognition for Cooking Recipe Retrieval , 2016, ACM Multimedia.

[3]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[4]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[5]  Nannan Li,et al.  Online Action Tube Detection via Resolving the Spatio-temporal Context Pattern , 2018, ACM Multimedia.

[6]  Xin Pan,et al.  YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Wei Liu,et al.  DeepProduct: Mobile Product Search With Portable Deep Features , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[8]  Meng Wang,et al.  TransNFCM: Translation-Based Neural Fashion Compatibility Modeling , 2018, AAAI.

[9]  Meng Wang,et al.  An Efficient Tracking System by Orthogonalized Templates , 2016, IEEE Transactions on Industrial Electronics.

[10]  Fan Yang,et al.  Deep Association: End-to-end Graph-Based Learning for Multiple Object Tracking with Conv-Graph Neural Network , 2019, ICMR.

[11]  Yu Cao,et al.  Annotating Objects and Relations in User-Generated Videos , 2019, ICMR.

[12]  Fei Wu,et al.  Segregated Temporal Assembly Recurrent Networks for Weakly Supervised Multiple Action Detection , 2018, AAAI.

[13]  Cordelia Schmid,et al.  AVA: A Video Dataset of Spatio-Temporally Localized Atomic Visual Actions , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Tat-Seng Chua,et al.  Object trajectory proposal , 2017, 2017 IEEE International Conference on Multimedia and Expo (ICME).

[15]  Tat-Seng Chua,et al.  Video Visual Relation Detection , 2017, ACM Multimedia.

[16]  Tao Mei,et al.  Recurrent Tubelet Proposal and Recognition Networks for Action Detection , 2018, ECCV.