论文信息 - A multilingual corpus for rich audio-visual scene description in a meeting-room environment

A multilingual corpus for rich audio-visual scene description in a meeting-room environment

In this paper, we present a multilingual database specifically designed to develop technologies for rich audio-visual scene description in meeting-room environments. Part of that database includes the already existing CHIL audio-visual recordings, whose annotations have been extended. A relevant objective in the new recorded sessions was to include situations in which the semantic content can not be extracted from a single modality. The presented database, that includes five hours of rather spontaneously generated scientific presentations, was manually annotated using standard or previously reported annotation schemes, and will be publicly available for the research purposes.

Taras Butko | M. Asunción Moreno Bilbao | Climent Nadeu Camprubí

[1] Alexander H. Waibel,et al. Computers in the Human Interaction Loop , 2009, Handbook of Ambient Intelligence and Smart Environments.

[2] Peter Wittenburg,et al. Annotation by Category: ELAN and ISO DCR , 2008, LREC.

[3] Shrikanth Narayanan,et al. The USC Creative IT Database: A Multimodal Database of Theatrical Improvisation , 2010 .

[4] Jean-Claude Martin,et al. Multi-level Annotations of Nonverbal Behaviors in French Spontaneous Conversation , 2010, LREC 2010.

[5] Khalid Choukri,et al. The CHIL audiovisual corpus for lecture and meeting analysis inside smart rooms , 2007, Lang. Resour. Evaluation.

[6] K. Karpouzis,et al. A multimodal corpus for gesture expressivity analysis , 2010 .

[7] Marie-Francine Moens,et al. Spatial Role Labeling: Task Definition and Annotation Scheme , 2010, LREC.

[8] Taras Butko,et al. Acoustic Event Detection Based on Feature-Level Fusion of Audio and Video Modalities , 2011, EURASIP J. Adv. Signal Process..

[9] Alexander H. Waibel. CHIL - Computers in the Human Interaction Loop , 2005, MVA.