A reduced yet extensible audio-visual description language

Enabling an intelligent access to multimedia data requires a powerful description language. In this paper we demonstrate why the MPEG-7 standard fails to fulfill this task. We introduce then our proposition: an audio-visual specific description language modular reduced but designed to be extensible. This language is centered on the notions of <i>descriptor</i> and <i>structure</i> with a well-defined semantics. A descriptor can be a low-level feature automatically extracted from the signal or a higher semantic concept that will be used to annotate the video documents. The descriptors can be combined into structures according to defined models that provide description patterns.