MT Adaptation for Under-Resourced Domains - What Works and What Not

In this paper the authors present various techniques of how to achieve MT domain adaptation with limited in-domain resources. This paper gives a case study of what works and what not if one has to build a domain specific machine translation system. Systems are adapted using in-domain comparable monolingual and bilingual corpora (crawled from the Web) and bilingual terms and named entities. The authors show how to efficiently integrate terms within statistical machine translation systems, thus significantly improving upon the baseline.

[1]  Tatiana Gornostay,et al.  LetsMT! - Online Platform for Sharing Training Data and Building User Tailored Machine Translation , 2010, Baltic HLT.

[2]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[3]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[4]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[5]  Philipp Koehn,et al.  Experiments in Domain Adaptation for Statistical Machine Translation , 2007, WMT@ACL.

[6]  Marcin Junczys-Dowmunt 16th Annual Conference of the European Association for Machine Translation (EAMT) , 2012 .

[7]  Nikola Ljubešić,et al.  Term Extraction, Tagging, and Mapping Tools for Under-Resourced Languages , 2012 .

[8]  Barry Haddow,et al.  Improved Minimum Error Rate Training in Moses , 2009, Prague Bull. Math. Linguistics.

[9]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[10]  Marcis Pinnis Latvian and Lithuanian Named Entity Recognition with TildeNER , 2012, LREC.

[11]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[12]  Sabine Hunsicker,et al.  Hybrid Parallel Sentence Mining from Comparable Corpora , 2012, EAMT.

[13]  William D. Lewis,et al.  Achieving Domain Specificity in SMT without Overt Siloing , 2010, LREC.

[14]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[15]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .