Tilde MT Platform for Developing Client Specific MT Solutions

In this paper, we present Tilde MT, a custom machine translation (MT) platform that provides linguistic data storage (parallel, monolingual corpora, multilingual term collections), data cleaning and normalisation, statistical and neural machine translation system training and hosting functionality, as well as wide integration capabilities (a machine user API and popular computer-assisted translation tool plugins). We provide details for the most important features of the platform, as well as elaborate typical MT system training workflows for client-specific MT solution development.

[1]  Roberts Rozis,et al.  Tilde MODEL - Multilingual Open Data for EU Languages , 2017, NODALIDA.

[2]  Marcis Pinnis,et al.  Neural Machine Translation for Morphologically Rich Languages with Improved Sub-word Units and Synthetic Data , 2017, TSD.

[3]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[4]  Raivis Skadins,et al.  Word Alignment Based Parallel Corpora Evaluation and Cleaning Using Machine Learning Techniques , 2015, EAMT.

[5]  Noah A. Smith,et al.  A Simple, Fast, and Effective Reparameterization of IBM Model 2 , 2013, NAACL.

[6]  Ralf Steinberger,et al.  DCEP -Digital Corpus of the European Parliament , 2014, LREC.

[7]  Tomaz Erjavec,et al.  The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[8]  Marcis Pinnis Context Independent Term Mapper for European Languages , 2013, RANLP.

[9]  Jörg Tiedemann,et al.  LetsMT!: Cloud-Based Platform for Do-It-Yourself Machine Translation , 2012, ACL.

[10]  Rico Sennrich,et al.  Linguistic Input Features Improve Neural Machine Translation , 2016, WMT.

[11]  Marcis Pinnis,et al.  Dynamic Terminology Integration Methods in Statistical Machine Translation , 2015, EAMT.

[12]  Kenneth Heafield,et al.  KenLM: Faster and Smaller Language Model Queries , 2011, WMT@EMNLP.

[13]  Andreas Eisele,et al.  DGT-TM: A freely available Translation Memory in 22 languages , 2012, LREC.

[14]  Tatiana Gornostay,et al.  Online platform for extracting, managing, and utilising multilingual terminology , 2013 .

[15]  Marcis Pinnis,et al.  Evaluation of Neural Machine Translation for Highly Inflected and Small Languages , 2017, CICLing.

[16]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[17]  Rico Sennrich,et al.  Nematus: a Toolkit for Neural Machine Translation , 2017, EACL.

[18]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[19]  Barry Haddow,et al.  Improved Minimum Error Rate Training in Moses , 2009, Prague Bull. Math. Linguistics.

[20]  Andreas Eisele,et al.  MultiUN v2: UN Documents with Multilingual Alignments , 2012, LREC.

[21]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[22]  Philipp Koehn,et al.  The MateCat Tool , 2014, COLING.

[23]  Marcin Junczys-Dowmunt,et al.  Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions , 2016, IWSLT.