论文信息 - The CAMOMILE Collaborative Annotation Platform for Multi-modal, Multi-lingual and Multi-media Documents

The CAMOMILE Collaborative Annotation Platform for Multi-modal, Multi-lingual and Multi-media Documents

In this paper, we describe the organization and the implementation of the CAMOMILE collaborative annotation framework for multimodal, multimedia, multilingual (3M) data. Given the versatile nature of the analysis which can be performed on 3M data, the structure of the server was kept intentionally simple in order to preserve its genericity, relying on standard Web technologies. Layers of annotations, defined as data associated to a media fragment from the corpus, are stored in a database and can be managed through standard interfaces with authentication. Interfaces tailored specifically to the needed task can then be developed in an agile way, relying on simple but reliable services for the management of the centralized annotations. We then present our implementation of an active learning scenario for person annotation in video, relying on the CAMOMILE server; during a dry run experiment, the manual annotation of 716 speech segments was thus propagated to 3504 labeled tracks. The code of the CAMOMILE framework is distributed in open source.

[1] Mark Liberman,et al. A formal framework for linguistic annotation , 1999, Speech Commun..

[2] Stéphane Ayache,et al. Video Corpus Annotation Using Active Learning , 2008, ECIR.

[3] Jean-Luc Gauvain,et al. Multistage speaker diarization of broadcast news , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4] Oliver Schreer,et al. ELAN as Flexible Annotation Framework for Sound and Image Processing Detectors , 2010, LREC.

[5] Benoît Otjacques,et al. Collaborative Annotation of Multimedia Resources , 2014, CDVE.

[6] Thomas Tamisier,et al. Benchmarking multimedia technologies with the CAMOMILE platform: the case of Multimodal Person Discovery at MediaEval 2015 , 2016, LREC.

[7] Olivier Galibert,et al. The REPERE Corpus : a multimodal corpus for person recognition , 2012, LREC.

[8] Stephanie Strassel,et al. Annotation Tools for Large-Scale Corpus Development: Using AGTK at the Linguistic Data Consortium , 2004, LREC.

[9] Michael Kipp,et al. ANVIL - a generic annotation tool for multimodal dialogue , 2001, INTERSPEECH.

[10] Stelios Piperidis. The META-SHARE Language Resources Sharing Infrastructure: Principles, Challenges, Solutions , 2012, LREC.

[11] David S. Doermann,et al. Tools and techniques for video performance evaluation , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[12] Stéphane Ayache,et al. Active Cleaning for Video Corpus Annotation , 2012, MMM.