Machine Translation: Phrase-Based, Rule-Based and Neural Approaches with Linguistic Evaluation

Abstract In this article we present a novel linguistically driven evaluation method and apply it to the main approaches of Machine Translation (Rule-based, Phrase-based, Neural) to gain insights into their strengths and weaknesses in much more detail than provided by current evaluation schemes. Translating between two languages requires substantial modelling of knowledge about the two languages, about translation, and about the world. Using English-German IT-domain translation as a case-study, we also enhance the Phrase-based system by exploiting parallel treebanks for syntax-aware phrase extraction and by interfacing with Linked Open Data (LOD) for extracting named entity translations in a post decoding framework.

[1]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[2]  Ventsislav Zhechev Unsupervised Generation of Parallel Treebanks through Sub-Tree Alignment , 2009, Prague Bull. Math. Linguistics.

[3]  Mark Steedman,et al.  Romantics and Revolutionaries , 2011 .

[4]  Eleftherios Avramidis,et al.  DFKI’s system for WMT16 IT-domain task, including analysis of systematic errors , 2016, WMT.

[5]  Liane Guillou,et al.  PROTEST: A Test Suite for Evaluating Pronouns in Machine Translation , 2016, LREC.

[6]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[7]  Philipp Cimiano,et al.  Mining translations from the web of open linked data , 2013, SWAIE@RANLP.

[8]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[9]  Dan Klein,et al.  Improved Inference for Unlexicalized Parsing , 2007, NAACL.

[10]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[11]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[12]  Jindrich Libovický,et al.  Neural Monkey: An Open-source Tool for Sequence Learning , 2017, Prague Bull. Math. Linguistics.

[13]  Andreas Eisele,et al.  MultiUN: A Multilingual Corpus from United Nation Documents , 2010, LREC.

[14]  Ankit Srivastava,et al.  Phrase extraction and rescoring in statistical machine translation , 2014 .

[15]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[16]  Joakim Nivre,et al.  Issues in Translating Verb-Particle Constructions from German to English , 2014, MWE@EACL.

[17]  Martín Abadi,et al.  TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems , 2016, ArXiv.

[18]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[19]  Mary Hearne,et al.  Comparing Constituency and Dependency Representations for SMT Phrase-Extraction , 2008, JEPTALNRECITAL.

[20]  Andy Way,et al.  Using BabelNet to Improve OOV Coverage in SMT , 2016, LREC.

[21]  David Chiang,et al.  A Hierarchical Phrase-Based Model for Statistical Machine Translation , 2005, ACL.

[22]  Nadir Durrani,et al.  Edinburgh’s Machine Translation Systems for European Language Pairs , 2013, WMT@ACL.

[23]  Philipp Koehn,et al.  Edinburgh’s Syntax-Based Machine Translation Systems , 2013, WMT@ACL.

[24]  Andy Way,et al.  Exploiting Parallel Treebanks to Improve Phrase-Based Statistical Machine Translation , 2009, CICLing.

[25]  Kenneth Heafield,et al.  N-gram Counts and Language Models from the Common Crawl , 2014, LREC.

[26]  Andy Way,et al.  Using percolated dependencies for phrase extraction in SMT , 2009 .