TRECVID, an annual retrieval evaluation benchmark organized by NIST, encourages research in information retrieval from digital video. TRECVID benchmarking covers both interactive and manual searching by end users, as well as the benchmarking of some supporting technologies including shot boundary detection, extraction of semantic features, and the automatic segmentation of TV news broadcasts. Evaluations done in the context of the TRECVID benchmarks show that generally, speech transcripts and annotations provide the single most important clue for successful retrieval. However, automatically finding the individual images is still a tremendous and unsolved challenge. The evaluations repeatedly found that none of the multimedia analysis and retrieval techniques provide a significant benefit over retrieval using only textual information such as from automatic speech recognition transcripts or closed captions. In interactive systems, we do find significant differences among the top systems, indicating that interfaces can make a huge difference for effective video/image search. For interactive tasks efficient interfaces require few key clicks, but display large numbers of images for visual inspection by the user. The text search finds the right context region in the video in general, but to select specific relevant images we need good interfaces to easily browse the storyboard pictures. In general, TRECVID has motivated the video retrieval community to be honest about what we don't know how to do well (sometimes through painful failures), and has focused us to work on the actual task of video retrieval, as opposed to flashy demos based on technological capabilities.
[1]
J. L. Myers.
Fundamentals of Experimental Design
,
1972
.
[2]
Paul Over,et al.
TRECVID: evaluating the effectiveness of information retrieval tasks on digital video
,
2004,
MULTIMEDIA '04.
[3]
Djoerd Hiemstra,et al.
Monitoring User-System Performance in Interactive Retrieval Tasks
,
2004,
RIAO.
[4]
Thierry Pun,et al.
The Truth about Corel - Evaluation in Image Retrieval
,
2002,
CIVR.
[5]
Peter G. B. Enser,et al.
Retrieval of Archival Moving Imagery - CBIR Outside the Frame?
,
2002,
CIVR.
[6]
Ellen M. Voorhees,et al.
The effect of topic set size on retrieval experiment error
,
2002,
SIGIR '02.
[7]
Ching-Yung Lin,et al.
Video Collaborative Annotation Forum: Establishing Ground-Truth Labels on Large Multimedia Datasets
,
2003,
TRECVID.
[8]
Gerard Salton,et al.
Research and Development in Information Retrieval
,
1982,
Lecture Notes in Computer Science.
[9]
Wei-Hao Lin,et al.
Revisiting the effect of topic set size on retrieval error
,
2005,
SIGIR '05.
[10]
Paul Over,et al.
TREC video retrieval evaluation: a case study and status report
,
2004
.
[11]
Gary Marchionini,et al.
Open video: A framework for a test collection
,
2000,
J. Netw. Comput. Appl..
[12]
Paul Over,et al.
The TREC 2001 Video Track : Information Retrieval on Digital Video lnf ormation
,
2005
.