Utilizing semantic word similarity measures for video retrieval

This is a high level computer vision paper, which employs concepts from Natural Language Understanding in solving the video retrieval problem. Our main contribution is the utilization of the semantic word similarity measures (Lin and PMI-IR similarities) for video retrieval. In our approach, we use trained concept detectors, and the visual co-occurrence relations between such concepts. We propose two methods for content-based retrieval of videos: (1) A method for retrieving a new concept(a concept which is not known to the system, and no annotation is available) using semantic word similarity and visual co-occurrence. (2) A method for retrieval of videos based on their relevance to a user defined text query using the semantic word similarity and visual content of videos. For evaluation purposes, we have mainly used the automatic search and the high level feature extraction test set of TRECVIDpsila06 benchmark, and the automatic search test set of TRECVIDpsila07. These two data sets consist of 250 hours of multilingual news video captured from American, Arabic, German and Chinese TV channels. Although our method for retrieving a new concept is an unsupervised method, it outperforms the trained concept detectors (which are supervised) on 7 out of 20 test concepts, and overall it performs very close to the trained detectors. On the other hand, our visual content based semantic retrieval method performs 81% better than the text-based retrieval method. This shows that using visual content alone we can obtain significantly good retrieval results.

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  Nicu Sebe,et al.  Context-Based Object-Class Recognition and Retrieval by Generalized Correlograms , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Dragutin Petkovic,et al.  CueVideo: automated multimedia indexing and retrieval , 1999, MULTIMEDIA '99.

[4]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[6]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[7]  Paul Lynch,et al.  An evaluation of new and old similarity ranking algorithms , 2004, International Conference on Information Technology: Coding and Computing, 2004. Proceedings. ITCC 2004..

[8]  Alexander G. Hauptmann,et al.  LSCOM Lexicon Definitions and Annotations (Version 1.0) , 2006 .

[9]  Rong Yan,et al.  Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[10]  Yan Liu,et al.  Fast video segment retrieval by Sort-Merge feature selection, boundary refinement, and lazy evaluation , 2003, Comput. Vis. Image Underst..

[11]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[12]  Marcel Worring,et al.  Adding Semantics to Detectors for Video Retrieval , 2007, IEEE Transactions on Multimedia.

[13]  Shih-Fu Chang,et al.  Columbia University’s Baseline Detectors for 374 LSCOM Semantic Visual Concepts , 2007 .

[14]  Chong-Wah Ngo,et al.  Towards optimal bag-of-features for object categorization and semantic video retrieval , 2007, CIVR '07.

[15]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[16]  Marcel Worring,et al.  Are Concept Detector Lexicons Effective for Video Search? , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[17]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[18]  Jin Zhao,et al.  Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting , 2006, CIVR.

[19]  Ted Pedersen,et al.  WordNet::Similarity - Measuring the Relatedness of Concepts , 2004, NAACL.

[20]  Dong Wang,et al.  The importance of query-concept-mapping for automatic video retrieval , 2007, ACM Multimedia.

[21]  Leonidas J. Guibas,et al.  A metric for distributions with applications to image databases , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[22]  Antonio Torralba,et al.  Contextual Models for Object Detection Using Boosted Random Fields , 2004, NIPS.

[23]  Mubarak Shah,et al.  Improving Semantic Concept Detection and Retrieval using Contextual Estimates , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[24]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.

[25]  Milind R. Naphade,et al.  Semantic Multimedia Retrieval using Lexical Query Expansion and Model-Based Reranking , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[26]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..