论文信息 - Informedia @ TRECVID 2017

Informedia @ TRECVID 2017

We report on our system used in the TRECVID 2017 Multimedia Event Detection (MED) and Ad-hoc Video Search (AVS) tasks. On the MED task, the CMU team submitted runs in 010Ex settings for the Pre-specified and Ad-hoc Events. On the AVS task, the CMU team submitted runs for fully-automatic system with no annotation condition.

[1] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2] William B. Dolan,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.

[3] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5] Georges Quénot,et al. TRECVid Semantic Indexing of Video: A 6-year Retrospective , 2016 .

[6] Deyu Meng,et al. Learning to Detect Concepts from Webly-Labeled Video Data , 2016, IJCAI.

[7] Shih-Fu Chang,et al. Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8] David A. Shamma,et al. YFCC100M , 2015, Commun. ACM.

[9] Yale Song,et al. TGIF: A New Dataset and Benchmark on Animated GIF Description , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10] Tomás Pajdla,et al. NetVLAD: CNN Architecture for Weakly Supervised Place Recognition , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Georges Quénot,et al. TRECVID 2017: Evaluating Ad-hoc and Instance Video Search, Events Detection, Video Captioning and Hyperlinking , 2017, TRECVID.

[12] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[13] Jonathan G. Fiscus,et al. TRECVID 2016: Evaluating Video Search, Video Event Detection, Localization, and Hyperlinking , 2016, TRECVID.

[14] Andrew Zisserman,et al. Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15] Tao Mei,et al. MSR-VTT: A Large Video Description Dataset for Bridging Video and Language , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Fei-Fei Li,et al. Large-Scale Video Classification with Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Christopher Joseph Pal,et al. Describing Videos by Exploiting Temporal Structure , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18] Ye Yuan,et al. Video Representation Learning and Latent Concept Mining for Large-scale Multi-label Video Classification , 2017, ArXiv.