Dynamic organization of audiovisual database using a user-defined similarity measure based on low-level features

In this paper we explore the way to allow a user to interactively organize a multimedia database through a dynamic interface, creating its own "audiovisual concepts" freely. The user defines distances on a small subset of documents, using low-level audio and video off-line automatically extracted descriptors. The semi-supervised learning process, relying on support vector regression used in an early fusion context, leads to generate a behavioral model of those descriptors thanks to human interaction, creating a personal audiovisual similarity measure.