Can High-Level Concepts Fill the Semantic Gap in Video Retrieval? A Case Study With Broadcast News

A number of researchers have been building high-level semantic concept detectors such as outdoors, face, building, to help with semantic video retrieval. Our goal is to examine how many concepts would be needed, and how they should be selected and used. Simulating performance of video retrieval under different assumptions of concept detection accuracy, we find that good retrieval can be achieved even when detection accuracy is low, if sufficiently many concepts are combined. We also derive suggestions regarding the types of concepts that would be most helpful for a large concept lexicon. Since our user study finds that people cannot predict which concepts will help their query, we also suggest ways to find the best concepts to use. Ultimately, this paper concludes that "concept-based" video retrieval with fewer than 5000 concepts, detected with a minimal accuracy of 10% mean average precision is likely to provide high accuracy results in broadcast news retrieval.

[1]  G. Zipf,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. , 1949 .

[2]  B. T. Smith,et al.  Digital Signal Processing Methods for Ultrasonic Backscattered Waves in Composite Materials , 1986, IEEE 1986 Ultrasonics Symposium.

[3]  Hans-Peter Frei,et al.  Concept based query expansion , 1993, SIGIR.

[4]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[6]  Kerry Rodden,et al.  Does organisation by similarity assist image browsing? , 2001, CHI.

[7]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[8]  Azriel Rosenfeld,et al.  Face recognition: A literature survey , 2003, CSUR.

[9]  Rong Yan,et al.  The combination limit in multimedia retrieval , 2003, MULTIMEDIA '03.

[10]  Paul Over,et al.  TRECVID: Benchmarking the Effectivenss of Information Retrieval Tasks on Digital Video , 2003, CIVR.

[11]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[12]  Rong Yan,et al.  Learning query-class dependent weights in automatic video retrieval , 2004, MULTIMEDIA '04.

[13]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[14]  Eero Sormunen,et al.  End-User Searching Challenges Indexing Practices in the Digital Newspaper Photo Archive , 2004, Information Retrieval.

[15]  John R. Kender,et al.  Visual concepts for news story tracking: analyzing and exploiting the NIST TRESVID video annotation experiment , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[16]  Milind R. Naphade,et al.  Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.

[17]  Tsinghua University at TRECVID 2005 , 2005, TRECVID.

[18]  Witold Pedrycz,et al.  Face recognition: A study in information fusion using fuzzy integral , 2005, Pattern Recognit. Lett..

[19]  Ophir Frieder,et al.  Surrogate scoring for improved metasearch precision , 2005, SIGIR '05.

[20]  Paul Over,et al.  TRECVID 2005 - An Overview , 2005, TRECVID.

[21]  Shih-Fu Chang,et al.  Combining text and audio-visual features in video indexing , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[22]  Rong Yan,et al.  Probabilistic models for combining diverse knowledge sources in multimedia retrieval , 2006 .

[23]  Marcel Worring,et al.  The Semantic Pathfinder: Using an Authoring Metaphor for Generic Multimedia Indexing , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[25]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[26]  Dong Xu,et al.  Columbia University TRECVID-2006 Video Search and High-Level Feature Extraction , 2006, TRECVID.

[27]  Jin Zhao,et al.  Video Retrieval Using High Level Features: Exploiting Query Matching and Confidence-Based Weighting , 2006, CIVR.

[28]  Apostol Natsev,et al.  Exploring Automatic Query Refinement for Text-Based Video Retrieval , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[29]  John R. Smith,et al.  IBM Research TRECVID-2009 Video Retrieval System , 2009, TRECVID.