Reuse of free resources in machine translation between Nynorsk and Bokmål

We describe the development of a two-way shallow-transfer machine translation system between Norwegian Nynorsk and Norwegian Bokm˚ built on the Apertium platform, using the Free and Open Source resources Norsk Ordbank and the Oslo‐Bergen Constraint Grammar tagger. We detail the integration of these and other resources in the system along with the construction of the lexical and structural transfer, and evaluate the translation quality in comparison with another system. Finally, some future work is suggested.

[1]  Mikel L. Forcada,et al.  Open-Source Portuguese-Spanish Machine Translation , 2006, PROPOR.

[2]  Miquel Espl,et al.  Bitextor, a free/open-source software to harvest translation memories from multilingual websites , 2009 .

[3]  Jean-Pierre Chanod,et al.  Tagging French - comparing a statistical and a constraint-based method , 1995, EACL.

[4]  Janne Bondi Johannessen,et al.  An automatic analysis of Norwegian compounds , 1998 .

[5]  Fred Karlsson,et al.  Constraint Grammar as a Framework for Parsing Running Text , 1990, COLING.

[6]  Mikel L. Forcada,et al.  Using target-language information to train part-of-speech taggers for machine translation , 2008, Machine Translation.

[7]  Eric Brill,et al.  Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging , 1995, VLC@ACL.

[8]  Eckhard Bick,et al.  The Fyntour Multilingual Weather and Sea Dialogue System , 2005 .

[9]  Geoffrey Leech,et al.  CLAWS4: The Tagging of the British National Corpus , 1994, COLING.

[10]  Mikel L. Forcada,et al.  Automatic induction of bilingual resources from aligned parallel corpora: application to shallow-transfer machine translation , 2007, Machine Translation.

[11]  Francis M. Tyers,et al.  Developing Prototypes for Machine Translation between Two Sami Languages , 2009, EAMT.

[12]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[13]  Atro Voutilainen,et al.  Comparing a Linguistic and a Stochastic Tagger , 1997, ACL.

[14]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[15]  Thorsten Brants,et al.  TnT – A Statistical Part-of-Speech Tagger , 2000, ANLP.

[16]  Felipe Sánchez-Martínez,et al.  A trigram part-of-speech tagger for the Apertium free/open-source machine translation platform , 2009, FREEOPMT.