Exploiting semantics on external resources to gather visual examples for video retrieval

With the huge and ever rising amount of video content available on the Web, there is a need to facilitate video retrieval functionalities on very large collections. Most of the current Web video retrieval systems rely on manual textual annotations to provide keyword-based search interfaces. These systems have to face the problems that users are often reticent to provide annotations, and that the quality of such annotations is questionable in many cases. An alternative commonly used approach is to ask the user for an image example, and exploit the low-level features of the image to find video content whose keyframes are similar to the image. In this case, the main limitation is the so-called semantic gap, which consists of the fact that low-level image features often do not match with the real semantics of the videos. Moreover, this approach may be a burden to the user, as it requires finding and providing the system with relevant visual examples. Aiming to address this limitation, in this paper, we present a hybrid video retrieval technique that automatically obtains visual examples by performing textual searches on external knowledge sources, such as DBpedia, Flickr and Google Images, which have different coverage and structure characteristics. Our approach exploits the semantics underlying the above knowledge sources to address the semantic gap problem. We have conducted evaluations to assess the quality of visual examples retrieved from the above external knowledge sources. The obtained results suggest that the use of external knowledge can provide valid visual examples based on a keyword-based query and, in the case that visual examples are provided explicitly by the user, it can provide visual examples that complement the manually provided ones to improve video search performance.

[1]  Yiannis S. Boutalis,et al.  CEDD: Color and Edge Directivity Descriptor: A Compact Descriptor for Image Indexing and Retrieval , 2008, ICVS.

[2]  Jingrui He,et al.  Manifold-ranking based image retrieval , 2004, MULTIMEDIA '04.

[3]  Marcel Worring,et al.  Content‐based video retrieval: Three example systems from TRECVid , 2008, Int. J. Imaging Syst. Technol..

[4]  John P. Collomosse,et al.  Free-hand sketch grouping for video retrieval , 2008, 2008 19th International Conference on Pattern Recognition.

[5]  Marieke Guy,et al.  Folksonomies: Tidying Up Tags? , 2006, D Lib Mag..

[6]  Yiannis S. Boutalis,et al.  FCTH: Fuzzy Color and Texture Histogram - A Low Level Feature for Accurate Image Retrieval , 2008, 2008 Ninth International Workshop on Image Analysis for Multimedia Interactive Services.

[7]  Shih-Fu Chang,et al.  A fully automated content-based video search engine supporting spatiotemporal queries , 1998, IEEE Trans. Circuits Syst. Video Technol..

[8]  Dragutin Petkovic,et al.  Query by Image and Video Content: The QBIC System , 1995, Computer.

[9]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[10]  G. A. Miller,et al.  WordNet : a lexical database for English : New horizons in commercial and industrial AI , 1995 .

[11]  Lei Cen,et al.  Fudan University at TRECVID 2008 , 2008, TRECVID.

[12]  Emine Yilmaz,et al.  Estimating average precision with incomplete and imperfect judgments , 2006, CIKM '06.

[13]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[14]  Wei-Ying Ma,et al.  Multimedia information retrieval: what is it, and why isn't anyone using it? , 2005, MIR '05.

[15]  Alexander G. Hauptmann,et al.  Successful approaches in the TREC video retrieval evaluations , 2004, MULTIMEDIA '04.

[16]  Yi Yang,et al.  Ranking with local regression and global alignment for cross media retrieval , 2009, ACM Multimedia.

[17]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[18]  Dong Wang,et al.  THU and ICRC at TRECVID 2007 , 2007, TRECVID.

[19]  Djoerd Hiemstra,et al.  The Lowlands team at TRECVID 2007 , 2008, TRECVID.

[20]  Markus Koch,et al.  Learning TRECVID'08 High-Level Features from YouTube , 2008, TRECVID.

[21]  Ximena Olivares,et al.  Boosting image retrieval through aggregating search results based on visual annotations , 2008, ACM Multimedia.

[22]  Shih-Fu Chang,et al.  Query-Adaptive Fusion for Multimodal Search , 2008, Proceedings of the IEEE.

[23]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[24]  Alan F. Smeaton,et al.  Using score distributions for query-time fusion in multimediaretrieval , 2006, MIR '06.

[25]  Haim H. Permuter,et al.  Mutual relevance feedback for multimodal query formulation in video retrieval , 2005, MIR '05.

[26]  George A. Miller WordNet: A Lexical Database for English , 1992, HLT.

[27]  Djoerd Hiemstra,et al.  A survey of pre-retrieval query performance predictors , 2008, CIKM '08.