Towards a High-Level Audio Framework for Video Retrieval Combining Conceptual Descriptions and Fully-Automated Processes

The growing need for ’intelligent’ video retrieval systems leads to new architectures combining multiple characterizations of the video content that rely on highly expressive frameworks while providing fully-automated indexing and retrieval processes. As a matter of fact, addressing the problem of combining modalities within expressive frameworks for video indexing and retrieval is of huge importance and the only solution for achieving significant retrieval performance. This paper presents a multi-facetted conceptual framework integrating multiple characterizations of the audio content for automatic video retrieval. It relies on an expressive representation formalism handling high-level audio descriptions of a video document and a full-text query framework in an attempt to operate video indexing and retrieval on audio features beyond state-of-the-art architectures operating on low-level features and keyword-annotation frameworks. Experiments on the multimedia topic search task of the TRECVID 2004 evaluation campaign validate our proposal.

[1]  Kien A. Hua,et al.  VideoGraph: A Graphical Object-Based Model for Representing and Querying Video Data , 2000, ER.

[2]  Paul Over,et al.  TRECVID 2004 - An Overview , 2004, TRECVID.

[3]  Shrikanth S. Narayanan,et al.  Speaker change detection using a new weighted distance measure , 2002, INTERSPEECH.

[4]  Özgür Ulusoy,et al.  A Semi-Automatic Semantic Annotation Tool for Video Databases , 2002 .

[5]  Pasquale Savino,et al.  An Approach to a Content-Based Retrieval of Multimedia Data , 1998, Multimedia Tools and Applications.

[6]  Alberto Del Bimbo,et al.  Annotation and Retrieval of Structured Video Documents , 2003, ECIR.

[7]  Hervé Martin,et al.  Querying virtual videos using path and temporal expressions , 1998, SAC '98.

[8]  Iadh Ounis,et al.  RELIEF: combining expressiveness and rapidity into a single system , 1998, SIGIR '98.

[9]  Alexander H. Waibel,et al.  Strategies for automatic segmentation of audio data , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Georges Quénot TREC-10 Shot Boundary Detection Task: CLIPS System Description and Evaluation , 2001, TREC.

[11]  John F. Sowa,et al.  Conceptual Structures: Information Processing in Mind and Machine , 1983 .

[12]  Alberto Del Bimbo,et al.  Taking into Consideration Sports Semantic Annotation of Sports Videos Content-based Multimedia Indexing and Retrieval , 2002 .

[13]  Yihong Gong,et al.  Image indexing and retrieval based on color histograms , 1996, Multimedia Tools and Applications.

[14]  Ahmed K. Elmagarmid,et al.  Integrated Video and Text for Content-based Access to Video Databases , 2004, Multimedia Tools and Applications.

[15]  Ahmed K. Elmagarmid,et al.  VideoText database systems , 1997, Proceedings of IEEE International Conference on Multimedia Computing and Systems.

[16]  C. J. van Rijsbergen,et al.  A Non-Classical Logic for Information Retrieval , 1997, Comput. J..

[17]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[18]  Ahmed K. Elmagarmid,et al.  Scene change detection techniques for video database systems , 1998, Multimedia Systems.

[19]  Paul Over,et al.  TRECVID: evaluating the effectiveness of information retrieval tasks on digital video , 2004, MULTIMEDIA '04.