Machine Translation of TV Subtitles for Large Scale Production

This paper describes our work on building and employing Statistical Machine Translation systems for TV subtitles in Scandinavia. We have built translation systems for Danish, English, Norwegian and Swedish. They are used in daily subtitle production and translate large volumes. As an example we report on our evaluation results for three TV genres. We discuss our lessons learned in the system development process which shed interesting light on the practical use of Machine Translation technology.

[1]  Jan Pedersen,et al.  Scandinavian Subtitles : A Comparative Study of Subtitling Norms in Sweden and Denmark with a Focus on Extralinguistic Cultural References , 2007 .

[2]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[3]  Fred Popowich,et al.  Machine Translation of Closed Captions , 2004, Machine Translation.

[4]  Antoni Oliver Gonzalez Automatic Multilingual Subtitling in the eTitle Project , 2006, TC.

[5]  Martin Volk,et al.  Evaluating MT with translations or translators: what is the difference? , 2007 .

[6]  Stephen Armstrong Improving the Quality of Automated DVD Subtitles via Example-Based Machine Translation , 2006, TC.

[7]  Martin Volk,et al.  The Automatic Translation of Film Subtitles. A Machine Translation Success Story? , 2008, J. Lang. Technol. Comput. Linguistics.

[8]  Stelios Piperidis,et al.  Condensing Sentences for Subtitle Generation , 2008, LREC.

[9]  Lucia Specia,et al.  Improving the Confidence of Machine Translation Quality Estimates , 2009, MTSUMMIT.

[10]  Zoe de Linde,et al.  The Semiotics of Subtitling , 1999 .

[11]  Jörg Tiedemann Improved Sentence Alignment for Movie Subtitles , 2007 .

[12]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[13]  Martin Volk,et al.  Using Linguistic Annotations in Statistical Machine Translation of Film Subtitles , 2009, NODALIDA.

[14]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.