论文信息 - n-gram Models for Video Semantic Indexing

n-gram Models for Video Semantic Indexing

We propose n-gram modeling of shot sequences for video semantic indexing, in which semantic concepts are extracted from a video shot. Most previous studies for this task have assumed that video shots in a video clip are independent from each other. We model the time-dependency between them assuming that n-consecutive video shots are dependent. Our models improve the robustness against occlusion and camera-angle changes by effectively using information from the previous video shots. In our experiments on the TRECVID 2012 Semantic Indexing Benchmark, we applied the proposed models to a system using Gaussian mixture models and support vector machines. Mean average precision was improved from 30.62% to 32.14%, which is the best performance on the TRECVID 2012 Semantic Indexing to the best of our knowledge.

Koichi Shinoda | Nakamasa Inoue

[1] Stéphane Ayache,et al. Video Corpus Annotation Using Active Learning , 2008, ECIR.

[2] Gabriela Csurka,et al. Adapted Vocabularies for Generic Visual Categorization , 2006, ECCV.

[3] Paul Over,et al. High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[4] Georges Quénot,et al. TRECVID 2015 - An Overview of the Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2011, TRECVID.

[5] Koichi Shinoda,et al. Robust Scene Recognition Using Language Models for Scene Contexts , 2006 .

[6] Koichi Shinoda,et al. A Fast and Accurate Video Semantic-Indexing System Using Fast MAP Adaptation and GMM Supervectors , 2012, IEEE Transactions on Multimedia.

[7] Marcel Worring,et al. Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[8] Dennis Koelma,et al. The MediaMill TRECVID 2008 Semantic Video Search Engine , 2008, TRECVID.

[9] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.

[10] Thomas Serre,et al. The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11] Thomas Mensink,et al. Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[12] Paul Over,et al. Evaluation campaigns and TRECVid , 2006, MIR '06.