TRECVID 2010 Known-item Search by NUS

This paper describes our system for auto search and inter- active search in the known-item search (KIS) task in TRECVID 2010. KIS task aims to find an unique video answer for each text query. The shift from traditional video search has prompted a series of challenges in processing and searching techniques that developed over the past few years. For the automatic search task, our VisionGo system performs query expansion and analysis, then employs multi-modality features in- cluding metadata, automatic speech recognition (ASR) and high level feature (HLF) to retrieve a ranked list of results deemed most relevant to the text-only query. To further improve the search performance, we crawl an extension set of tags from Youtube to supplement to TRECVID metadata. For interactive search task, we propose a new feedback scheme based on both related samples and exclusive negative samples to boost the search performance. To accomplish this, we introduce three enhance- ments to our VisioGo system: a) related sample feedback algorithm that allows users to indicate related (but not relevant) shots to the query; b) exclusive negative sample selection approach; and c) clustered shot-icons for efficiently representing the whole content of the video. Results from TRECVID 2010 video test set indicate that the enhancements are effec- tive.

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[3]  Otis Gospodnetic,et al.  Lucene in Action , 2004 .

[4]  Gang Wang,et al.  TRECVID 2004 Search and Feature Extraction Task by NUS PRIS , 2004, TRECVID.

[5]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[6]  Dong Xu,et al.  Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction , 2006, TRECVID.

[7]  Jin Zhao,et al.  Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting , 2006, CIVR.

[8]  Sheng Tang,et al.  TRECVID 2007 Search Tasks by NUS-ICT , 2007, TRECVID.

[9]  Sheng Tang,et al.  TRECVID 2007 High-Level Feature Extraction By MCG-ICT-CAS , 2007, TRECVID.

[10]  Franciska de Jong,et al.  Annotation of Heterogeneous Multimedia Content Using Automatic Speech Recognition , 2007, SAMT.

[11]  Tao Mei,et al.  Building a comprehensive ontology to refine video concept detection , 2007, MIR '07.

[12]  Sheng Tang,et al.  News Video Retrieval using Implicit Event Semantics , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[13]  Duy-Dinh Le,et al.  National institute of informatics, japan at TRECVID 2007: BBC rushes summarization , 2007, TVS '07.

[14]  Yongdong Zhang,et al.  Segregated feedback with performance-based adaptive sampling for interactive news video retrieval , 2007, ACM Multimedia.

[15]  Shih-Fu Chang,et al.  Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .

[16]  Meng Wang,et al.  MSRA-USTC-SJTU at TRECVID 2007: High-Level Feature Extraction and Search , 2007, TRECVID.

[17]  Yongdong Zhang,et al.  Adaptive multiple feedback strategies for interactive video search , 2008, CIVR '08.

[18]  Shih-Fu Chang,et al.  CU-VIREO 374 : Fusing Columbia 374 and VIREO 374 for Large Scale Semantic Concept Detection , 2008 .

[19]  Qi Tian,et al.  Probabilistic optimized ranking for multimedia semantic concept detection via RVM , 2008, CIVR '08.

[20]  Meng Wang,et al.  MSRA atT TRECVID 2008: High-Level Feature Extraction and Automatic Search , 2008, TRECVID.

[21]  Yue Gao,et al.  Dynamic video summarization using two-level redundancy detection , 2009, Multimedia Tools and Applications.

[22]  Xian-Sheng Hua,et al.  Video Annotation Based on Kernel Linear Neighborhood Propagation , 2008, IEEE Transactions on Multimedia.

[23]  Yue Gao,et al.  Clip based video summarization and ranking , 2008, CIVR '08.

[24]  Duy-Dinh Le,et al.  National Institute of Informatics, Japan at TRECVID 2008 , 2008, TRECVID.

[25]  Qi Tian,et al.  Visual Synset: Towards a higher-level visual representation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Meng Wang,et al.  Visual query suggestion , 2009, ACM Multimedia.

[27]  TRECVID 2009 of MCG-ICT-CAS , 2009, TRECVID.

[28]  Meng Wang,et al.  Unified Video Annotation via Multigraph Learning , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[29]  Tao Mei,et al.  Graph-based semi-supervised learning with multiple labels , 2009, J. Vis. Commun. Image Represent..

[30]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[31]  Chong-Wah Ngo,et al.  VIREO/DVMM at TRECVID 2009: High-Level Feature Extraction, Automatic Video Search, and Content-Based Copy Detection , 2009, TRECVID.

[32]  Tat-Seng Chua,et al.  VisionGo: towards true interactivity , 2009, CIVR '09.

[33]  Shuicheng Yan,et al.  Robust Graph Mode Seeking by Graph Shift , 2010, ICML.

[34]  Shuicheng Yan,et al.  Efficient large-scale image annotation by probabilistic collaborative multi-label propagation , 2010, ACM Multimedia.

[35]  Tat-Seng Chua,et al.  Utilizing related samples to learn complex queries in interactive concept-based video search , 2010, CIVR '10.