Cross-lingual Sentence Compression for Subtitles

We present an approach for translating subtitles where standard time and space constraints are modeled as part of the generation of translations in a phrase-based statistical machine translation system (PBSMT). We propose and experiment with two promising strategies for jointly translating and compressing subtitles from English into Portuguese. The quality of the automatic translations is measured via the human post-editing of such translations so that they become adequate, fluent and compliant with time and space constraints. Experiments show that carefully selecting the data to tune the model parameters in the PB-SMT system already improves over an unconstrained baseline and that adding specific model components to guide the translation process can further improve the final translations under certain conditions.

[1]  Daniel Marcu,et al.  Statistics-Based Summarization - Step One: Sentence Compression , 2000, AAAI/IAAI.

[2]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[3]  Walter Daelemans,et al.  Automatic Sentence Simplification for Subtitling in Dutch and English , 2004, LREC.

[4]  Yi Pan,et al.  Sentence Compression for Automated Subtitling: A Hybrid Approach , 2004, ACL 2004.

[5]  Fred Popowich,et al.  Machine Translation of Closed Captions , 2004, Machine Translation.

[6]  Walter Daelemans,et al.  Multimodal, Multilingual Resources in the Subtitling Process , 2004, LREC.

[7]  Chris Callison-Burch,et al.  Paraphrasing with Bilingual Parallel Corpora , 2005, ACL.

[8]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[9]  Walter Daelemans,et al.  Investigating Lexical Substitution Scoring for Subtitle Generation , 2006, CoNLL.

[10]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[11]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[12]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[13]  Way Andy,et al.  LEADING BY EXAMPLE: AUTOMATIC TRANSLATION OF SUBTITLES VIA EBMT , 2007 .

[14]  Martin Volk,et al.  The Automatic Translation of Film Subtitles. A Machine Translation Success Story? , 2008, J. Lang. Technol. Comput. Linguistics.

[15]  Mirella Lapata,et al.  Sentence Compression as Tree Transduction , 2009, J. Artif. Intell. Res..

[16]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[17]  Lucia Specia,et al.  Exploiting Objective Annotations for Minimising Translation Post-editing Effort , 2011, EAMT.

[18]  Chris Callison-Burch,et al.  Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation , 2011, EMNLP.

[19]  Alon Lavie,et al.  Better Hypothesis Testing for Statistical Machine Translation: Controlling for Optimizer Instability , 2011, ACL.

[20]  Lucia Specia,et al.  Assessing the Post-Editing Effort for Automatic and Semi-Automatic Translations of DVD Subtitles , 2011, RANLP.

[21]  Lucia Specia,et al.  PET: a Tool for Post-editing and Assessing Machine Translation , 2012, LREC.