论文信息 - SAVAS: Collecting, Annotating and Sharing Audiovisual Language Resources for Automatic Subtitling

SAVAS: Collecting, Annotating and Sharing Audiovisual Language Resources for Automatic Subtitling

This paper describes the data collection, annotation and sharing activities carried out within the FP7 EU-funded SAVAS project. The project aims to collect, share and reuse audiovisual language resources from broadcasters and subtitling companies to develop large vocabulary continuous speech recognisers in specific domains and new languages, with the purpose of solving the automated subtitling needs of the media industry.

[1] Orphée De Clercq,et al. Data Collection and IPR in Multilingual Parallel Corpora. Dutch Parallel Corpus , 2010, LREC.

[2] Orphée De Clercq,et al. Dutch Parallel Corpus , 2011 .

[3] Isabel Trancoso,et al. The L2F Broadcast News Speech Recognition System , 2010 .

[4] Mark Liberman,et al. Transcriber: Development and use of a tool for assisting speech corpora production , 2001, Speech Commun..

[5] João Paulo da Silva Neto,et al. Evaluation of a live broadcast news subtitling system for portuguese , 2008, INTERSPEECH.