Toponym Disambiguation in an English-Lithuanian SMT System with Spatial Knowledge

This paper presents an innovative research resulting in the English-Lithuanian statistical factored phrase-based machine translation system with a spatial ontology. The system is based on the Moses toolkit and is enriched with semantic knowledge inferred from the spatial ontology. The ontology was developed on the basis of the GeoNames database (more than 15 000 toponyms), implemented in the web ontology language (OWL), and integrated into the machine translation process. Spatial knowledge was added as an additional factor in the statistical translation model and used for toponym disambiguation during machine translation. The implemented machine translation approach was evaluated against the baseline system without spatial knowledge. A multifaceted evaluation strategy including automatic metrics, human evaluation and linguistic analysis, was implemented to perform evaluation experiments. The results of the evaluation have shown a slight improvement in the output quality of machine translation with spatial knowledge.

[1]  Raivis Skadins,et al.  Improving SMT for Baltic Languages with Factored Models , 2010, Baltic HLT.

[2]  Philipp Koehn,et al.  Moses: Open Source Toolkit for Statistical Machine Translation , 2007, ACL.

[3]  Philipp Koehn,et al.  Findings of the 2009 Workshop on Statistical Machine Translation , 2009, WMT@EACL.

[4]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[5]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[6]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[7]  Jörg Tiedemann,et al.  News from OPUS — A collection of multilingual parallel corpora with tools and interfaces , 2009 .

[8]  Tatiana Gornostay,et al.  English-Latvian Toponym Processing: Translation Strategies and Linguistic Patterns , 2009, EAMT.

[9]  Daniel Marcu,et al.  SPMT: Statistical Machine Translation with Syntactified Target Language Phrases , 2006, EMNLP.

[10]  Philipp Koehn,et al.  Factored Translation Models , 2007, EMNLP.

[11]  George R. Doddington,et al.  Automatic Evaluation of Machine Translation Quality Using N-gram Co-Occurrence Statistics , 2002 .

[12]  Chris Callison-Burch,et al.  Demonstration of Joshua: An Open Source Toolkit for Parsing-based Machine Translation , 2009, ACL.

[13]  Anthony G. Cohn,et al.  A Spatial Logic based on Regions and Connection , 1992, KR.

[14]  Chris Quirk,et al.  Dependency Treelet Translation: Syntactically Informed Phrasal SMT , 2005, ACL.

[15]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[16]  David Chiang,et al.  Hierarchical Phrase-Based Translation , 2007, CL.

[17]  Jochen L. Leidner Toponym resolution in text: annotation, evaluation and applications of spatial grounding , 2007, SIGF.