How hard is it to automatically translate phrasal verbs from English to French

The translation of English phrasal verbs (PVs) into French is a challenge, specially when the verb occurs apart from the particle. Our goal is to quantify how well current SMT paradigms can translate split PVs into French. We compare two inhouse SMT systems, phrase-based and hierarchical, in translating a test set of PVs. Our analysis is based on a carefully designed evaluation protocol for assessing translation quality of a specific linguistic phenomenon. We find out that (a) current SMT technology can only translate 27% of PVs correctly, (b) in spite of their simplistic model, phrase-based systems outperform hierarchical systems and (c) when both systems translate the PV similarly, translation quality improves.

[1]  Pierre Zweigenbaum,et al.  Identifying bilingual Multi-Word Expressions for Statistical Machine Translation , 2012, LREC.

[2]  John Sinclair,et al.  Collins COBUILD dictionary of phrasal verbs , 1991 .

[3]  Bruce Fraser,et al.  The verb-particle combination in English , 1976 .

[4]  Sara Stymne A Comparison of Merging Strategies for Translation of German Compounds , 2009, EACL.

[5]  Preslav Nakov,et al.  Large-Scale Noun Compound Interpretation Using Bootstrapping and the Web as a Corpus , 2011, EMNLP.

[6]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[7]  Qun Liu,et al.  Improving Statistical Machine Translation Using Domain Bilingual Multiword Expressions , 2009, MWE@IJCNLP.

[8]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[9]  Ted Briscoe,et al.  The Second Release of the RASP System , 2006, ACL.

[10]  Timothy Baldwin,et al.  Multiword Expressions , 2010, Handbook of Natural Language Processing.

[11]  Sara Stymne,et al.  Blast: A Tool for Error Analysis of Machine Translation Output , 2011, ACL.

[12]  Anabela Barreiro,et al.  Taking on new challenges in multi-word unit processing for machine translation , 2011 .

[13]  Carlos Ramisch,et al.  Multiword Expressions in the wild? The mwetoolkit comes in handy , 2010, COLING.

[14]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.

[15]  Marine Carpuat,et al.  Task-based Evaluation of Multiword Expressions: a Pilot Study in Statistical Machine Translation , 2010, NAACL.

[16]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[17]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[18]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[19]  D. Bolinger The phrasal verb in English , 1974 .