Contribution of NLP to the Content Indexing of Multimedia Documents

This paper describes the role natural language processing (NLP) can play for multimedia applications. As an example of such an application, we present an approach dealing with the conceptual indexing of soccer videos which the help of structured information automatically extracted by NLP tools from multiple sources of information relating to video content, consisting in a rich range of textual and transcribed sources covering soccer games. This work has been investigated and developed in the EU funded project MUMIS. As a second example of such an application, we describe briefly ongoing work in the context of the Esperonto project dealing with upgrading the actual web towards the Semantic Web (SW), including the automatic semantic indexing of web pages containing a combination of text and images.

[1]  Thierry Declerck A set of Tools for Integrating Linguistic and Non-Linguistic Information , 2002, SAAKM@ECAI.

[2]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[3]  Hans-Ulrich Krieger,et al.  TDL-A Type Description Language for Constraint-Based Grammars , 1994, COLING.

[4]  Shih-Fu Chang,et al.  A fully automated content-based video search engine supporting spatiotemporal queries , 1998, IEEE Trans. Circuits Syst. Video Technol..

[5]  Rohini K. Srihari,et al.  Automatic Indexing and Content-Based Retrieval of Captioned Images , 1995, Computer.

[6]  Kathleen R. McKeown,et al.  Text generation , 1985 .

[7]  Thierry Declerck,et al.  The Automatic Generation of Formal Annotations in a Multimedia Indexing and Searching Environment , 2001, HTLKM@ACL.

[8]  Djoerd Hiemstra,et al.  Language-Based Multimedia Information Retrieval , 2000, RIAO.

[9]  Kalina Bontcheva,et al.  Access to Multimedia Information through Multisource and Multilanguage Information Extraction , 2002, NLDB.

[10]  Riccardo Leonardi,et al.  Semantic Description of Multimedia Documents: the MPEG-7 Approach , 2001 .

[11]  Johanna D. Moore,et al.  Planning Text for Advisory Dialogues , 1989, ACL.

[12]  Vasileios Hatzivassiloglou,et al.  Text-Based Approaches for the Categorization of Images , 1999, ECDL.

[13]  Hermann Ney,et al.  The Philips research system for large-vocabulary continuous-speech recognition , 1993, EUROSPEECH.

[14]  Pete Whitelock,et al.  Proceedings of the 17th international conference on Computational linguistics - Volume 2 , 1998 .

[15]  HongJiang Zhang,et al.  Automatic parsing of TV soccer programs , 1995, Proceedings of the International Conference on Multimedia Computing and Systems.

[16]  Michael Johnston,et al.  Unification-based Multimodal Parsing , 1998, ACL.

[17]  Hamish Cunningham Information Extraction - A User Guide , 1997, ArXiv.

[18]  Steffen Staab,et al.  CREAM: creating relational metadata with a component-based, ontology-driven annotation framework , 2001, K-CAP '01.

[19]  Kathleen McKeown,et al.  Text generation: using discourse strategies and focus constraints to generate natural language text , 1985 .

[20]  Steffen Staab,et al.  An annotation framework for the semantic web , 2001 .

[21]  Remco C. Veltkamp,et al.  Content-based image retrieval systems: A survey , 2000 .

[22]  Mark T. Maybury,et al.  Broadcast news navigation using story segmentation , 1997, MULTIMEDIA '97.

[23]  Helmer Strik,et al.  Goal-directed ASR in a multimedia indexing and searching environment (MUMIS) , 2002, INTERSPEECH.