Informedia at TRECVID 2003 : Analyzing and Searching Broadcast News Video

Abstract : A concentrated effort was made by the authors to develop an interface allowing a human to succeed with video topics as defined in TRECVID 2001. This interface was part of the TRECVID 2002 interactive query task, in which a person could issue multiple queries and refinements to the video corpus in formulating the shot answer set for the topic at hand. The interface was designed to present a visually rich set of thumbnail images to the user, tailored for expert control over the number, scale, and attributes of the images. Armed with this interface, an expert user completely familiar with the retrieval system and its features, but having no a priori knowledge of the TRECVID 2002 search test corpus, performed well on the search tasks. This exact system as used in the TRECVID 2002 interactive query task was again used for the TRECVID 2003 evaluation. To facilitate better visual browsing, we extended the storyboard idea to show keyframes across multiple video documents, where a document is automatically derived by segmenting a video production into story units through speech, silence, black frames, and other heuristics. The hierarchy of information units is frame, shot, document and full production. A set of documents is returned by a query. The shots for these documents are presented in a single storyboard, i.e., an ordered set of keyframes presented simultaneously on the computer screen, one keyframe per shot. Without further filtering, most queries would overwhelm the user with too many images. Through the use of query context, the cardinality of the image set can be greatly reduced. The search engine for text queries makes use of the Okapi method. The multiple document storyboard can be set to show only the shots containing matching words. This strategy of selecting a single thumbnail image to represent a video document based on query context resulted in more efficient information retrieval with greater user satisfaction.

[1]  Michael G. Christel,et al.  Enhanced access to digital video through visually rich interfaces , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).

[2]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[3]  Marcel Worring,et al.  Interactive Adaptive Movie Annotation , 2003, IEEE Multim..

[4]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[5]  Marek R. Ogiela,et al.  Multimedia tools and applications , 2005, Multimedia Tools and Applications.

[6]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[7]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[8]  Harriet J. Nock,et al.  Audio-visual synchrony for detection of monologues in video archives , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Anil K. Jain,et al.  Statistical Pattern Recognition: A Review , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Svetha Venkatesh,et al.  Toward automatic extraction of expressive elements from motion pictures: tempo , 2002, IEEE Trans. Multim..

[11]  Udi Manber,et al.  APPROXIMATE STRING MATCHING WITH ARBITRARY COSTS FOR TEXT AND HYPERTEXT , 1993 .

[12]  Takeo Kanade,et al.  Object Detection Using the Statistics of Parts , 2004, International Journal of Computer Vision.

[13]  Thomas S. Huang,et al.  Factor graph framework for semantic video indexing , 2002, IEEE Trans. Circuits Syst. Video Technol..

[14]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[15]  Takeo Kanade,et al.  Video OCR: indexing digital news libraries by recognition of superimposed captions , 1999, Multimedia Systems.

[16]  Dustin Boswell,et al.  Introduction to Support Vector Machines , 2002 .

[17]  David Bordwell,et al.  Film Art: An Introduction , 1979 .

[18]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[19]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[20]  John Zimmerman,et al.  A probabilistic layered framework for integrating multimedia content and context information , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  Georges Quénot,et al.  CLIPS at TREC 11: Experiments in Video Retrieval , 2002, TREC.

[22]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[23]  Roberto Brunelli,et al.  Person identification using multiple cues , 1995, IEEE Transactions on Pattern Analysis and Machine Intelligence.