How many high-level concepts will fill the semantic gap in video retrieval ?

A number of researchers have been building high-level seman tic concept detectors such as outdoors, face, building, etc., t o help with semantic video retrieval. Using the TRECVID video collecti on and LSCOM truth annotations from 300 concepts, we simulate perf ormance of video retrieval under different assumptions of con cept detection accuracy. Even low detection accuracy provides g ood retrieval results, when sufficiently many concepts are used. C onsidering this extrapolation under reasonable assumptions, th is paper arrives at the conclusion that “concept-based” video retri eval with fewer than 5000 concepts, detected with minimal accuracy of 10% mean average precision is likely to provide high accuracy re sults, comparable to text retrieval on the web, in a typical broadca st news collection. We also derive evidence that it should be feasib le to find sufficiently many new, useful concepts that would be help ful for retrieval.

[1]  G. Zipf,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology. , 1949 .

[2]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[3]  Arnold Neumaier,et al.  Global Optimization by Multilevel Coordinate Search , 1999, J. Glob. Optim..

[4]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Arden Alexander,et al.  The Thesaurus for Graphic Materials: Its History, Use, and Future , 2001 .

[6]  Kerry Rodden,et al.  Does organisation by similarity assist image browsing? , 2001, CHI.

[7]  Wei-Hao Lin,et al.  News video classification using SVM-based multimodal classifiers and combination strategies , 2002, MULTIMEDIA '02.

[8]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[9]  John R. Smith,et al.  VideoAnnEx: IBM MPEG-7 Annotation Tool for Multimedia Indexing and Concept Learning , 2003 .

[10]  Rong Yan,et al.  The combination limit in multimedia retrieval , 2003, MULTIMEDIA '03.

[11]  Paul Over,et al.  TRECVID: Benchmarking the Effectivenss of Information Retrieval Tasks on Digital Video , 2003, CIVR.

[12]  Tobun Dorbin Ng,et al.  Informedia at TRECVID 2003 : Analyzing and Searching Broadcast News Video , 2003, TRECVID.

[13]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[14]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[15]  P. Kantor Foundations of Statistical Natural Language Processing , 2001, Information Retrieval.

[16]  Edward Y. Chang,et al.  Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[17]  Jun Yang,et al.  Finding Person X: Correlating Names with Visual Appearances , 2004, CIVR.

[18]  Eero Sormunen,et al.  End-User Searching Challenges Indexing Practices in the Digital Newspaper Photo Archive , 2004, Information Retrieval.

[19]  John R. Kender,et al.  Visual concepts for news story tracking: analyzing and exploiting the NIST TRESVID video annotation experiment , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[20]  Milind R. Naphade,et al.  Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.

[21]  Ophir Frieder,et al.  Surrogate scoring for improved metasearch precision , 2005, SIGIR '05.

[22]  Paul Over,et al.  TRECVID 2005 - An Overview , 2005, TRECVID.

[23]  Shih-Fu Chang,et al.  Combining text and audio-visual features in video indexing , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[24]  Rong Yan,et al.  Probabilistic models for combining diverse knowledge sources in multimedia retrieval , 2006 .

[25]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[26]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.