论文信息 - Tag-based video retrieval by embedding semantic content in a continuous word space

Tag-based video retrieval by embedding semantic content in a continuous word space

Content-based event retrieval in unconstrained web videos, based on query tags, is a hard problem due to large intra-class variances, and limited vocabulary and accuracy of the video concept detectors, creating a "semantic query gap". We present a technique to overcome this gap by using continuous word space representations to explicitly compute query and detector concept similarity. This not only allows for fast query-video similarity computation with implicit query expansion, but leads to a compact video representation, which allows implementation of a real-time retrieval system that can fit several thousand videos in a few hundred megabytes of memory. We evaluate the effectiveness of our representation on the challenging NIST MEDTest 2014 dataset.

Ram Nevatia | Rama Kovvuri | Cees G.M. Snoek | Arnav Agharwal

[1] Shuang Wu,et al. Zero-Shot Event Detection Using Multi-modal Fusion of Weakly Supervised Concepts , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Xirong Li,et al. Few-Example Video Event Retrieval using Tag Propagation , 2014, ICMR.

[3] George A. Miller,et al. WordNet: A Lexical Database for English , 1995, HLT.

[4] Andrew Y. Ng,et al. Zero-Shot Learning Through Cross-Modal Transfer , 2013, NIPS.

[5] Yi Yang,et al. Exploring Semantic Inter-Class Relationships (SIR) for Zero-Shot Action Recognition , 2015, AAAI.

[6] Marcel Worring,et al. Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7] George A. Miller. WordNet: A Lexical Database for English , 1992, HLT.

[8] Paul Over,et al. Evaluation campaigns and TRECVid , 2006, MIR '06.

[9] Daphne Koller,et al. Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[10] Yi Yang,et al. Fast and Accurate Content-based Semantic Search in 100M Internet Videos , 2015, ACM Multimedia.

[11] Georges Quénot,et al. TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[12] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[14] Cees Snoek,et al. Composite Concept Discovery for Zero-Shot Video Event Detection , 2014, ICMR.

[15] Quoc V. Le,et al. Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[16] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[17] Peter Young,et al. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions , 2014, TACL.

[18] Teruko Mitamura,et al. Zero-Example Event Search using MultiModal Pseudo Relevance Feedback , 2014, ICMR.

[19] Deyu Meng,et al. Easy Samples First: Self-paced Reranking for Zero-Example Multimedia Search , 2014, ACM Multimedia.

[20] Gregory K. Myers,et al. Late fusion and calibration for multimedia event detection using few examples , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21] Ramakant Nevatia,et al. Large-scale web video event classification by use of Fisher Vectors , 2013, 2013 IEEE Workshop on Applications of Computer Vision (WACV).

[22] Masoud Mazloom,et al. Querying for video events by semantic signatures from few examples , 2013, MM '13.

[23] Mubarak Shah,et al. UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild , 2012, ArXiv.

[24] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[25] James Allan,et al. Zero-shot video retrieval using content and concepts , 2013, CIKM.

[26] Trevor Darrell,et al. Open-vocabulary Object Retrieval , 2014, Robotics: Science and Systems.

[27] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.