Managing Multimodal Data, Metadata and Annotations

Publisher Summary The goal of this chapter is to outline the main stages in multimodal data management, starting with the capture of multimodal raw data in instrumented spaces. The capture of multimodal corpora requires complex settings such as instrumented lecture and meeting rooms, containing capture devices for each of the modalities that are intended to be recorded, but also, most challengingly, requiring hardware and software for digitizing and synchronizing the acquired signals. The resolution of the capture devices—mainly cameras and microphones—has a determining influence on the quality of the resulting corpus, with apparently more trivial factors such as the position of these devices in the environment. The number of devices is also important: A larger number provides more information to help define the ground truth for a given annotation dimension. Annotations is the time-dependent information that is abstracted from input signals, and which includes low-level mono or multimodal features, as well as higher-level phenomena, abstracted or not from the low-level features. Conversely metadata provides the static information about an entire unit of data capture, which is not involved in a time-dependent relation to its content, i.e., which is generally constant for the entire unit.

[1]  Andreas Stolcke,et al.  The ICSI Meeting Corpus , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  Jean Carletta,et al.  The AMIDA Automatic Content Linking Device: Just-in-Time Document Retrieval in Meetings , 2008, MLMI.

[3]  Martial Michel,et al.  The NIST Smart Space and Meeting Room projects: signals, acquisition annotation, and metrics , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[4]  Steve Whittaker,et al.  A meeting browser evaluation test , 2005, CHI Extended Abstracts.

[5]  Anton Nijholt,et al.  Online and off-line visualization of meeting information and meeting support , 2006, The Visual Computer.

[6]  Jean Carletta,et al.  Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus , 2007, Lang. Resour. Evaluation.

[7]  Khalid Choukri,et al.  The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms , 2007, Lang. Resour. Evaluation.

[8]  Thomas C. Schmidt The transcription system EXMARaLDA: An application of the annotation graph formalism as the basis of a database of multilingual spoken discourse , 2001 .

[9]  Paul Over,et al.  High-level feature detection from video in TRECVid: a 5-year retrospective of achievements , 2009 .

[10]  Stefan Evert,et al.  The NITE XML Toolkit: Flexible annotation for multimodal language data , 2003, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[11]  Jean Carletta,et al.  The NITE XML Toolkit: Data Model and Query Language , 2005, Lang. Resour. Evaluation.

[12]  Daan Broeder,et al.  Using Profiles for IMDI Metadata Creation , 2004, LREC.

[13]  Andrei Popescu-Belis,et al.  Generating Usable Formats for Metadata and Annotations in a Large Meeting Corpus , 2007, ACL.

[14]  Paul Buitelaar,et al.  Towards Metadata Interoperability , 2004, NLPXML@ACL.

[15]  Rainer Stiefelhagen,et al.  Face Recognition in Smart Rooms , 2007, MLMI.

[16]  Michael Kipp,et al.  ANVIL - a generic annotation tool for multimodal dialogue , 2001, INTERSPEECH.

[17]  Daan Broeder,et al.  Metadata Proposals for Corpora and Lexica , 2002, LREC.

[18]  Thierry Declerck,et al.  A Large Metadata Domain of Language Resources , 2004, LREC.

[19]  Jean Carletta,et al.  The AMI Meeting Corpus: A Pre-announcement , 2005, MLMI.

[20]  Mike Flynn,et al.  Browsing Recorded Meetings with Ferret , 2004, MLMI.

[21]  Susanne Burger,et al.  The ISL meeting corpus: the impact of meeting type on speech style , 2002, INTERSPEECH.

[22]  Steve Whittaker,et al.  Design and evaluation of systems to support interaction capture and retrieval , 2008, Personal and Ubiquitous Computing.

[23]  Harry Bunt,et al.  Standardization in Multimodal Content Representation: Some Methodological Issues , 2004, LREC.

[24]  Jean Carletta,et al.  The NITE XML Toolkit Meets the ICSI Meeting Corpus: Import, Annotation, and Browsing , 2004, MLMI.

[25]  Arun Ross,et al.  Multimodal biometrics: An overview , 2004, 2004 12th European Signal Processing Conference.

[26]  Niels Ole Bernsen,et al.  Towards General-Purpose Annotation Tools - How Far Are We Today? , 2004, LREC.

[27]  Andrei Popescu-Belis,et al.  Towards an Objective Test for Meeting Browsers: The BET4TQB Pilot Experiment , 2007, MLMI.

[28]  Maël Guillemot,et al.  From Meeting Recordings to Web Distribution: Description of the Process , 2005 .

[29]  Gary Simons,et al.  Extending Dublin Core Metadata to Support the Description and Discovery of Language Resources , 2003, Comput. Humanit..