Adding Semantics to Detectors for Video Retrieval

In this paper, we propose an automatic video retrieval method based on high-level concept detectors. Research in video analysis has reached the point where over 100 concept detectors can be learned in a generic fashion, albeit with mixed performance. Such a set of detectors is very small still compared to ontologies aiming to capture the full vocabulary a user has. We aim to throw a bridge between the two fields by building a multimedia thesaurus, i.e., a set of machine learned concept detectors that is enriched with semantic descriptions and semantic structure obtained from WordNet. Given a multimodal user query, we identify three strategies to select a relevant detector from this thesaurus, namely: text matching, ontology querying, and semantic visual querying. We evaluate the methods against the automatic search task of the TRECVID 2005 video retrieval benchmark, using a news video archive of 85 h in combination with a thesaurus of 363 machine learned concept detectors. We assess the influence of thesaurus size on video search performance, evaluate and compare the multimodal selection strategies for concept detectors, and finally discuss their combined potential using oracle fusion. The set of queries in the TRECVID 2005 corpus is too small for us to be definite in our conclusions, but the results suggest promising new lines of research.

[1]  Gerard Salton,et al.  Dynamic document processing , 1972, CACM.

[2]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[3]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[4]  Ramanathan V. Guha,et al.  Building Large Knowledge-Based Systems: Representation and Inference in the Cyc Project , 1990 .

[5]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[6]  George A. Miller,et al.  A Semantic Concordance , 1993, HLT.

[7]  Karen Spärck Jones,et al.  Automatic content-based retrieval of broadcast news , 1995, MULTIMEDIA '95.

[8]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[9]  Vladimir Naumovich Vapni The Nature of Statistical Learning Theory , 1995 .

[10]  Takeo Kanade,et al.  Informedia Digital Video Library , 1995, CACM.

[11]  Alan F. Smeaton,et al.  Experiments on using semantic distances between words in image caption retrieval , 1996, SIGIR '96.

[12]  Shih-Fu Chang,et al.  Visually Searching the Web for Content , 1997, IEEE Multim..

[13]  B. S. Manjunath,et al.  NeTra: A toolbox for navigating large image databases , 1997, Proceedings of International Conference on Image Processing.

[14]  Wolfgang Effelsberg,et al.  On the detection and recognition of television commercials , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[15]  Alberto Del Bimbo,et al.  Visual Image Retrieval by Elastic Matching of User Sketches , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[16]  Shih-Fu Chang,et al.  A fully automated content-based video search engine supporting spatiotemporal queries , 1998, IEEE Trans. Circuits Syst. Video Technol..

[17]  E. T. K. Sang Memory-Based Shallow Parsing , 2002, Journal of machine learning research.

[18]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Anoop Gupta,et al.  Automatically extracting highlights for TV Baseball programs , 2000, ACM Multimedia.

[20]  Arnold W. M. Smeulders,et al.  PicToSeek: combining color and shape invariant features for image retrieval , 2000, IEEE Trans. Image Process..

[21]  P. Bartlett,et al.  Probabilities for SV Machines , 2000 .

[22]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[23]  Milind R. Naphade,et al.  A probabilistic framework for semantic video indexing, filtering, and retrieval , 2001, IEEE Trans. Multim..

[24]  Bob J. Wielinga,et al.  Ontology-Based Photo Annotation , 2001, IEEE Intell. Syst..

[25]  Haim H. Permuter,et al.  IBM Research TREC 2002 Video Retrieval System , 2002, TREC.

[26]  Erik F. Tjong Kim Sang,et al.  Memory-Based Shallow Parsing , 2002, J. Mach. Learn. Res..

[27]  Djoerd Hiemstra,et al.  A Probabilistic Multimedia Retrieval Model and Its Evaluation , 2003, EURASIP J. Adv. Signal Process..

[28]  Anthony Hoogs,et al.  Video content annotation using visual analysis and a large semantic knowledgebase , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[29]  Ching-Yung Lin,et al.  Video Collaborative Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets , 2003, TRECVID.

[30]  A. T. Schreiber,et al.  Semantic Annotation of Image Collections , 2003 .

[31]  M. de Rijke,et al.  The effectiveness of combining information retrieval strategies for European languages , 2004, SAC '04.

[32]  Rong Yan,et al.  Learning query-class dependent weights in automatic video retrieval , 2004, MULTIMEDIA '04.

[33]  Jianping Fan,et al.  ClassView: hierarchical video shot classification, indexing, and accessing , 2004, IEEE Transactions on Multimedia.

[34]  Eero Hyvönen,et al.  A Cultural Community Portal for Publishing Museum Collections on the Semantic Web , 2004, ECAI Workshop on Application of Semantic Web Technologies to Web Communities.

[35]  Hugo Liu,et al.  ConceptNet — A Practical Commonsense Reasoning Tool-Kit , 2004 .

[36]  Gang Wang,et al.  TRECVID 2004 Search and Feature Extraction Task by NUS PRIS , 2004, TRECVID.

[37]  M. de Rijke,et al.  Monolingual Document Retrieval for European Languages , 2004, Information Retrieval.

[38]  Marcel Worring,et al.  Building a visual ontology for video retrieval , 2005, MULTIMEDIA '05.

[39]  Milind R. Naphade,et al.  Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.

[40]  John R. Smith,et al.  A web-based system for collaborative annotation of large image and video collections: an evaluation and user study , 2005, MULTIMEDIA '05.

[41]  Alberto Del Bimbo,et al.  Automatic video annotation using ontologies extended with visual information , 2005, MULTIMEDIA '05.

[42]  Joshua R. Smith,et al.  A Web-based System for Collaborative Annotation of Large Image and Video Collections , 2005 .

[43]  Shih-Fu Chang,et al.  Automatic discovery of query-class-dependent models for multimodal search , 2005, MULTIMEDIA '05.

[44]  Pinar Duygulu Sahin,et al.  Joint visual-text modeling for automatic retrieval of multimedia documents , 2005, ACM Multimedia.

[45]  Alan F. Smeaton,et al.  Large Scale Evaluations of Multimedia Information Retrieval: The TRECVid Experience , 2005, CIVR.

[46]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[47]  Marcel Worring,et al.  The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Nuno Vasconcelos,et al.  Query by Semantic Example , 2006, CIVR.

[49]  Cor J. Veenman,et al.  Robust Scene Categorization by Learning Image Statistics in Context , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[50]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[51]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[52]  Laura Hollink,et al.  Semantic annotation for retrieval of visual resources , 2006 .

[53]  Jin Zhao,et al.  Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting , 2006, CIVR.

[54]  Marcel Worring,et al.  A Learned Lexicon-Driven Paradigm for Interactive Video Retrieval , 2007, IEEE Transactions on Multimedia.

[55]  Rong Yan,et al.  Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News , 2007, IEEE Transactions on Multimedia.

[56]  Seyed M. M. Tahaghoghi,et al.  Exploring Human Judgement of Digital Imagery , 2007, ACSC.

[57]  Xie Kanglin Lucene Search Engine , 2007 .

[58]  Marcel Worring,et al.  Semantic Image and Video Indexing in Broad Domains , 2007, IEEE Trans. Multim..

[59]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.