How many high-level concepts will fill the semantic gap in news video retrieval?

A number of researchers have been building high-level semantic concept detectors such as outdoors, face, building, etc., to help with semantic video retrieval. Using the TRECVID video collection and LSCOM truth annotations from 300 concepts, we simulate performance of video retrieval under different assumptions of concept detection accuracy. Even low detection accuracy provides good retrieval results, when sufficiently many concepts are used. Considering this extrapolation under reasonable assumptions, this paper arrives at the conclusion that "concept-based" video retrieval with fewer than 5000 concepts, detected with minimal accuracy of 10% mean average precision is likely to provide high accuracy results, comparable to text retrieval on the web, in a typical broadcast news collection. We also derive evidence that it should be feasible to find sufficiently many new, useful concepts that would be helpful for retrieval.

[1]  Arden Alexander,et al.  The Thesaurus for Graphic Materials: Its History, Use, and Future , 2001 .

[2]  Milind R. Naphade,et al.  Learning the semantics of multimedia queries and concepts from a small number of examples , 2005, MULTIMEDIA '05.

[3]  R. Manmatha,et al.  Automatic image annotation and retrieval using cross-media relevance models , 2003, SIGIR.

[4]  Edward Y. Chang,et al.  Optimal multimodal fusion for multimedia data analysis , 2004, MULTIMEDIA '04.

[5]  Jun Yang,et al.  Finding Person X: Correlating Names with Visual Appearances , 2004, CIVR.

[6]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Marcel Worring,et al.  The challenge problem for automated detection of 101 semantic concepts in multimedia , 2006, MM '06.

[8]  Rong Yan,et al.  The combination limit in multimedia retrieval , 2003, MULTIMEDIA '03.

[9]  Shih-Fu Chang,et al.  Combining text and audio-visual features in video indexing , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Yihong Gong,et al.  Lessons Learned from Building a Terabyte Digital Video Library , 1999, Computer.

[11]  Brendan J. Frey,et al.  Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[12]  John R. Kender,et al.  Visual concepts for news story tracking: analyzing and exploiting the NIST TRESVID video annotation experiment , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Rong Yan,et al.  Probabilistic models for combining diverse knowledge sources in multimedia retrieval , 2006 .

[14]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[15]  John R. Smith,et al.  VideoAnnEx: IBM MPEG-7 Annotation Tool for Multimedia Indexing and Concept Learning , 2003 .

[16]  Kerry Rodden,et al.  Does organisation by similarity assist image browsing? , 2001, CHI.

[17]  Paul Over,et al.  TRECVID: evaluating the effectiveness of information retrieval tasks on digital video , 2004, MULTIMEDIA '04.

[18]  Paul Over,et al.  TRECVID: Benchmarking the Effectivenss of Information Retrieval Tasks on Digital Video , 2003, CIVR.

[19]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[20]  Tobun Dorbin Ng,et al.  Informedia at TRECVID 2003 : Analyzing and Searching Broadcast News Video , 2003, TRECVID.

[21]  Eero Sormunen,et al.  End-User Searching Challenges Indexing Practices in the Digital Newspaper Photo Archive , 2004, Information Retrieval.

[22]  Scott P. Robertson,et al.  Proceedings of the SIGCHI Conference on Human Factors in Computing Systems , 1991 .

[23]  Ophir Frieder,et al.  Surrogate scoring for improved metasearch precision , 2005, SIGIR '05.

[24]  Arnold Neumaier,et al.  Global Optimization by Multilevel Coordinate Search , 1999, J. Glob. Optim..

[25]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[26]  John R. Smith,et al.  Large-scale concept ontology for multimedia , 2006, IEEE MultiMedia.

[27]  John R. Smith,et al.  On the detection of semantic concepts at TRECVID , 2004, MULTIMEDIA '04.

[28]  Wei-Hao Lin,et al.  News video classification using SVM-based multimodal classifiers and combination strategies , 2002, MULTIMEDIA '02.

[29]  Paul Over,et al.  TRECVID 2005 - An Overview , 2005, TRECVID.