TED-MWE: a bilingual parallel corpus with MWE annotation

We greatly acknowledge the PARSEME IC1207 COST Action for supporting this work. We are particularly grateful to Manuela Cherchi, Erika Ibba, Anna De Santis, Giuseppe Casu, Jessica Ladu, Ilaria Del Rio, Elisa Virdis, Gino Castangia for their annotation work.

[1]  Timothy Baldwin,et al.  Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[2]  Timothy Baldwin,et al.  Multiword Expressions , 2010, Handbook of Natural Language Processing.

[3]  Johanna Monti,et al.  Multi-word unit processing in machine translation. Developing and using language resources for multi-word unit processing in machine translation , 2015 .

[4]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[5]  Carlos Ramisch,et al.  How hard is it to automatically translate phrasal verbs from English to French , 2013 .

[6]  Noah A. Smith,et al.  Comprehensive Annotation of Multiword Expressions in a Social Web Corpus , 2014, LREC.

[7]  Mauro Cettolo,et al.  IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.

[8]  Barry Haddow,et al.  Improved Minimum Error Rate Training in Moses , 2009, Prague Bull. Math. Linguistics.

[9]  Joakim Nivre,et al.  Issues in Translating Verb-Particle Constructions from German to English , 2014, MWE@EACL.

[10]  Veronika Vincze Light Verb Constructions in the SzegedParalellFX English-Hungarian Parallel Corpus , 2012, LREC.

[11]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[12]  Fernando Batista,et al.  When Multiwords Go Bad in Machine Translation , 2013 .

[13]  Veronika Vincze,et al.  4FX: Light Verb Constructions in a Multilingual Parallel Corpus , 2014, LREC.

[14]  Mauro Cettolo,et al.  WIT3: Web Inventory of Transcribed and Translated Talks , 2012, EAMT.