A machine-assisted human translation system for technical documents

Translation systems are known to benefit from the availability of a bilingual lexicon for a domain of interest. A system, aiming to build such a lexicon from source language corpus, often requires human assistance and is confronted by conflicting requirements of minimizing human translation effort while improving the translation quality. We present an approach that exploits redundancy in the source corpus and extracts recurring patterns which are: frequent, syntactically well-formed, and provide maximum corpus coverage. The patterns generalize over phrases and word types and our approach finds a succinct set of good patterns with high coverage. Our interactive system leverages these patterns in multiple iterations of translation and post-editing, thereby progressively generating a high quality bilingual lexicon.

[1]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[2]  Anna Freud,et al.  Grammatical Framework Programming With Multilingual Grammars , 2016 .

[3]  Sergei Nirenburg,et al.  The Proper Place of Men and Machines in Language Translation , 2003 .

[4]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .

[5]  Satoshi Sato,et al.  Toward Memory-based Translation , 1990, COLING.

[6]  Francisco Casacuberta,et al.  Online Learning for Interactive Statistical Machine Translation , 2010, NAACL.

[7]  Akshar Bharati,et al.  Anusaaraka: Overcoming the Language Barrier in India , 2003, ArXiv.

[8]  Philipp Koehn,et al.  The MateCat Tool , 2014, COLING.

[9]  Alex Waibel,et al.  The CMU statistical machine translation system , 2003, MTSUMMIT.

[10]  Ramona Enache,et al.  Patent translation within the MOLTO project , 2011, MTSUMMIT.

[11]  Francisco Casacuberta,et al.  An Interactive Machine Translation System with Online Learning , 2011, ACL.

[12]  Alon Lavie,et al.  Learning from Post-Editing: Online Model Adaptation for Statistical Machine Translation , 2014, EACL.

[13]  Hermann Ney,et al.  Improved Statistical Alignment Models , 2000, ACL.

[14]  EHARA Terumasa,et al.  Rule based machine translation combined with statistical post editor for Japanese to English patent translation , 2007, MTSUMMIT.

[15]  Lucia Specia,et al.  PET: a Tool for Post-editing and Assessing Machine Translation , 2012, LREC.

[16]  Mauro Cettolo,et al.  Cache-based Online Adaptation for Machine Translation Enhanced Computer Assisted Translation , 2013, MTSUMMIT.

[17]  Germán Sanchis-Trilles,et al.  CASMACAT: A Computer-assisted Translation Workbench , 2014, EACL.

[18]  Harold L. Somers,et al.  Review Article: Example-based Machine Translation , 1999, Machine Translation.

[19]  Francisco Casacuberta,et al.  Interactive Machine Translation , 2011 .

[20]  Aarne Ranta,et al.  Grammatical Framework , 2004, Journal of Functional Programming.

[21]  Francis M. Tyers,et al.  Apertium: a free/open-source platform for rule-based machine translation , 2011, Machine Translation.

[22]  Michael Carl,et al.  Translog-II: a Program for Recording User Activity Data for Empirical Reading and Writing Research , 2012, LREC.

[23]  Johann Roturier,et al.  The ACCEPT post-editing environment: a flexible and customisable online tool to perform and analyse machine translation post-editing , 2013, MTSUMMIT.

[24]  Oliver Streiter Linguistic modeling for multilingual machine translation , 1996 .

[25]  Bonnie J. Dorr,et al.  Machine Translation: A View from the Lexicon , 1994, CL.

[26]  Alexander H. Waibel,et al.  Fast decoding for statistical machine translation , 1998, ICSLP.

[27]  F. J. Pelletier The Principle of Semantic Compositionality , 1994 .