That Obscure Object of Desire: Multimedia Metadata on the Web, Part 1

In this article we discuss the advances in, and remaining problems of, making use of audio-visual media in a semantic-based environment, such as the Semantic Web, facilitated through media-aware and ontologybased metadata. Our discussion is predominantly motivated by the two most widely known approaches towards machine-processable and semantic-based content description, namely the Semantic Web activity of the W3C [2, 4] and ISO’s efforts in the direction of complex media content modeling, in particular the the Multimedia Content Description Interface (MPEG-7) [30, 31, 32, 33, 34]. We chose these two approaches as they provide the potential techniques to establish a media-aware Semantic Web, even though at the time of writing the approaches seem to be diverging rather than converging. The Semantic Web should bring machine-processable content to Web pages, thus being an extension of the current Web. The aim is to add ontology-based metadata to Web resources to improve Internet search and provide means for machine-based reasoning about the content. A major drawback of the current semantic Web developments, however, is its media-agnostic view on Web resources. The specific needs of dynamic audio-visual media with its variety of data representations is not recognized. That is, however, precisely what the intention of MPEG is, partially addressed in MPEG-4 [27] and MPEG-21 [35] and fully in MPEG-7. In this paper, we explain that the conceptual ideas and technologies discussed in both approaches are essential for the next step in Web-based multimedia development. Unfortunately, there are still many practical obstacles that block their widespread use for providing multimedia metadata on the Web. We show that a media-ware Semantic Web will blur the boundaries between traditional categories like preproduction, production, and postproduction, with far-reaching effects on concepts such as data, metadata, consumer and producer. The paper is structured as follows. We first provide a scenario to explain our vision of a media-aware Semantic Web and derive from it a number of problems regarding the semantic content description of media units. We then discuss the multimedia production chain, in particular emphasizing the role of progressive metadata production. As a result we distill a set of media-based metadata production requirements and show how current media production environments fail to address these. We then introduce those parts of the W3C and ISO standardization works that are relevant to our discussion. We analyze their abilities to define structures for describing media semantics, discuss syntactic and semantic problems, ontological problems for media semantics, and the problems of applying the theoretical concepts to real world problems.

[1]  Steven J. DeRose,et al.  Xml pointer language (xpointer) version 1 , 2001 .

[2]  Peter F. Patel-Schneider,et al.  The Yin/Yang web: XML syntax and RDF semantics , 2002, WWW '02.

[3]  Jane Hunter,et al.  A Comparison of Schemas for Video Metadata Representation , 1999, Comput. Networks.

[4]  James A. Hendler,et al.  Web ontology language (OWL) reference version 1 , 2002 .

[5]  Karen Spärck Jones,et al.  Audio Indexing and Retrieval of Complete Broadcoast News Shows , 2000, RIAO.

[6]  Jane Hunter,et al.  Adding Multimedia to the Semantic Web: Building an MPEG-7 ontology , 2001, SWWS.

[7]  Ian Horrocks,et al.  The Semantic Web: The Roles of XML and RDF , 2000, IEEE Internet Comput..

[8]  Alan P. Parkes Settings and the setting structure: the description and automated propagation of networks for perusing videodisk image states , 1989, SIGIR '89.

[9]  Bob J. Wielinga,et al.  Ontology-Based Photo Annotation , 2001, IEEE Intell. Syst..

[10]  Jorma Tarhio,et al.  Searching monophonic patterns within polyphonic sources , 2000 .

[11]  Philippe Aigrain,et al.  Medium knowledge-based macro-segmentation of video into sequences , 1997 .

[12]  Glorianna Davenport,et al.  The Stratification System - A Design Emvironment for Random Access , 1992, NOSSDAV.

[13]  David Orchard,et al.  XML Linking Language (XLink) , 2001 .

[14]  Michelle Y. Kim,et al.  Extensible MPEG-4 textual format (XMT) , 2000, MULTIMEDIA '00.

[15]  David A. Duce,et al.  Scalable Vector Graphics SVG 1.0 Specification , 2000 .

[16]  Roy T. Fielding,et al.  Uniform Resource Identifiers (URI): Generic Syntax , 1998, RFC.

[17]  Mark Davis,et al.  The Unicode Standard, Version 3.0 , 2000 .

[18]  Simone Santini,et al.  Integrated browsing and querying for image databases , 2000, IEEE MultiMedia.

[19]  Alberto Del Bimbo,et al.  Semantics in Visual Information Retrieval , 1999, IEEE Multim..

[20]  Arjeh M. Cohen,et al.  Synchronized Multimedia Integration Language (SMIL) 2.0 , 1998 .

[21]  Lynda Hardman,et al.  Denotative and connotative semantics in hypermedia: proposal for a semiotic-aware architecture , 2001, New Rev. Hypermedia Multim..

[22]  John Domingue,et al.  Visualizing Internetworked Argumentation , 2003, Visualizing Argumentation.

[23]  P. ed Hoschka,et al.  synchronized Multimedia Integration Language (SMIL) 1.0 Specification , 1998 .

[24]  Alberto Del Bimbo,et al.  Visual information retrieval , 1999 .

[25]  Jane Hunter,et al.  Combining RDF and XML schemas to enhance interoperability between metadata application profiles , 2001, WWW '01.

[26]  Alan P. Parkes,et al.  The Application of Video Semantics and Theme Representation in Automated Video Editing , 2004, Multimedia Tools and Applications.

[27]  David Pye,et al.  AT_TV: Broadcast Television and Radio Retrieval , 2000, RIAO.

[28]  P. Beek,et al.  Text of 15938-5 FCD Information Technology-Multimedia Content Description Interface-Pard 5 Multimedia Description Schemes , 2001 .

[29]  Steffen Staab,et al.  The Ontology Inference Layer OIL , 2000 .

[30]  Amarnath Gupta,et al.  Visual information retrieval , 1997, CACM.

[31]  François Pachet,et al.  A taxonomy of musical genres , 2000, RIAO.

[32]  Nicola Orio,et al.  SMILE: a System for Content-based Musical Information Retrieval Environments , 2000, RIAO.

[33]  Tim Berners-Lee,et al.  Weaving The Web: The Original Design And Ultimate Destiny of the World Wide Web , 1999 .

[34]  Yumi Sohn,et al.  MPEG-7 metadata authoring tool , 2002, MULTIMEDIA '02.

[35]  Steven J. DeRose,et al.  XML Path Language (XPath) Version 1.0 , 1999 .

[36]  Marc Davis,et al.  Media streams: representing video for retrieval and repurposing , 1994, MULTIMEDIA '94.