Knowledge-assisted cross-media analysis of audio-visual content in the news domain

In this paper, a complete architecture for knowledge-assisted cross-media analysis of News-related multimedia content is presented, along with its constituent components. The proposed analysis architecture employs state-of-the-art methods for the analysis of each individual modality (visual, audio, text) separately, and proposes a fusion technique based on the particular characteristics of News-related content for the combination of the individual modality analysis results. Experimental results on news broadcast video illustrate the usefulness of the proposed techniques in the automatic generation of semantic video annotations.

[1]  Jonathan G. Fiscus,et al.  Automatic Language Model Adaptation for Spoken Document Retrieval , 2000, RIAO.

[2]  David A. van Leeuwen,et al.  The AMI Speaker Diarization System for NIST RT06s Meeting Data , 2006, MLMI.

[3]  Sharon L. Oviatt,et al.  Unification-based Multimodal Integration , 1997, ACL.

[4]  Shih-Fu Chang,et al.  The holy grail of content-based media analysis , 2002 .

[5]  Michael G. Strintzis,et al.  Knowledge-assisted semantic video object detection , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[6]  Thomas Sikora,et al.  The MPEG-7 visual standard for content description-an overview , 2001, IEEE Trans. Circuits Syst. Video Technol..

[7]  Michael G. Strintzis,et al.  Ontology-Driven Semantic Video Analysis Using Visual Information Objects , 2007, SAMT.

[8]  Jesús Bescós,et al.  Real-time shot change detection over online MPEG-2 video , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[9]  Satoshi Nakamura,et al.  Statistical multimodal integration for audio-visual speech processing , 2002, IEEE Trans. Neural Networks.

[10]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[11]  Ulrich Schäfer,et al.  Shallow Processing with Unification and Typed Feature Structures - Foundations and Applications , 2004, Künstliche Intell..

[12]  Bo Zhang,et al.  Support vector machine learning for image retrieval , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[13]  Jane Hunter,et al.  Evaluating the application of semantic inferencing rules to image annotation , 2005, K-CAP '05.

[14]  Alberto Del Bimbo,et al.  Soccer highlights detection and recognition using HMMs , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[15]  Robert P. W. Duin,et al.  Using two-class classifiers for multiclass classification , 2002, Object recognition supported by user interaction for service robots.

[16]  Franciska de Jong,et al.  Multimedia Search Without Visual Analysis: The Value of Linguistic and Contextual Information , 2007, IEEE Transactions on Circuits and Systems for Video Technology.

[17]  Roeland Ordelman,et al.  Filtering the unknown: speech activity detection in heterogeneous video collections , 2007, INTERSPEECH.

[18]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Franciska de Jong,et al.  Automated Speech and Audio Analysis for Semantic Access to Multimedia , 2006, SAMT.

[20]  Michael G. Strintzis,et al.  Still Image Segmentation Tools For Object-Based Multimedia Applications , 2004, Int. J. Pattern Recognit. Artif. Intell..

[21]  Jan Alexandersson,et al.  Overlay as the Basic Operation for Discourse Processing in a Multimodal Dialogue System , 2001 .

[22]  Paul Over,et al.  Evaluation campaigns and TRECVid , 2006, MIR '06.