Extracting Transfer Rules for Multiword Expressions from Parallel Corpora

This paper presents a procedure for extracting transfer rules for multiword expressions from parallel corpora for use in a rule based Japanese-English MT system. We show that adding the multi-word rules improves translation quality and sketch ideas for learning more such rules.

[1]  Yuji Matsumoto,et al.  Combining resources for open source machine translation , 2007, TMI.

[2]  Yves Lepage,et al.  Sampling-based Multilingual Alignment , 2009, RANLP.

[3]  Eva Forsbom,et al.  Training a super model look-alike , 2003, MTSUMMIT.

[4]  Satoshi Shirai,et al.  Toward an MT System without Pre-Editing - Effects of New Methods in ALT-J/E - , 1995, ArXiv.

[5]  Setsuo Yamada,et al.  Corpus-Assisted Expansion of Manual MT Knowledge , 2002 .

[6]  Samuel Reese,et al.  FreeLing 2.1: Five Years of Open-source Language Processing Tools , 2010, LREC.

[7]  Masaki Murata,et al.  Multilingual Aligned Parallel Treebank Corpus Reflecting Contextual Information and Its Applications , 2004 .

[8]  Yuji Matsumoto,et al.  Feedback Cleaning of Machine Translation Rules Using Automatic Evaluation , 2003, ACL.

[9]  Andy Way A hybrid architecture for robust MT using LFG-DOP , 1999, J. Exp. Theor. Artif. Intell..

[10]  Chris Callison-Burch,et al.  Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases , 2005, ACL.

[11]  Eric Nichols,et al.  Deep open-source machine translation , 2011, Machine Translation.

[12]  Walter Daelemans Special Issue on Memory-based Language Processing , 1999 .

[13]  Satoshi Shirai,et al.  Construction of a Dictionary for Translating Japanese Phrases into One English Word , 2001 .

[14]  Jim Breen,et al.  JMdict: a Japanese-Multilingual Dictionary , 2004 .

[15]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[16]  Dan Flickinger,et al.  Minimal Recursion Semantics: An Introduction , 2005 .

[17]  Y. Tanaka,et al.  Compilation of a multilingual parallel corpus , 2001 .

[18]  Timothy Baldwin,et al.  Multiword expressions: linguistic precision and reusability , 2002, LREC.

[19]  Jan Tore Lønning,et al.  Towards hybrid quality-oriented machine translation – on linguistics and probabilities in MT , 2007, TMI.

[20]  Yuji Matsumoto,et al.  Applying Conditional Random Fields to Japanese Morphological Analysis , 2004, EMNLP.