论文信息 - Improving Word Translation Disambiguation by Capturing Multiword Expressions with Dictionaries - 字舞流文

Improving Word Translation Disambiguation by Capturing Multiword Expressions with Dictionaries

The paper describes a method for identifying and translating multiword expressions using a bi-directional dictionary. While a dictionarybased approach suffers from limited recall, precision is high; hence it is best employed alongside an approach with complementing properties, such as an n-gram language model. We evaluate the method on data from the English-German translation part of the crosslingual word sense disambiguation task in the 2010 semantic evaluation exercise (SemEval). The output of a baseline disambiguation system based on n-grams was substantially improved by matching the target words and their immediate contexts against compound and collocational words in a dictionary.

Björn Gambäck | André Lynum | Erwin Marsi | Lars Bungum

[1] Masatoshi Yoshikawa,et al. Learning bilingual translations from comparable corpora to cross-language information retrieval: hybrid statistics-based and linguistics-based approach , 2003, IRAL.

[2] Mauro Cettolo,et al. IRSTLM: an open source toolkit for handling large scale language models , 2008, INTERSPEECH.

[3] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[4] Timothy Baldwin,et al. Multiword Expressions: A Pain in the Neck for NLP , 2002, CICLing.

[5] Véronique Hoste,et al. SemEval-2010 Task 3: Cross-Lingual Word Sense Disambiguation , 2010, SemEval@ACL.

[6] Véronique Hoste,et al. Construction of a Benchmark Data Set for Cross-lingual Word Sense Disambiguation , 2010, LREC.

[7] Carlos Ramisch,et al. Alignment-based extraction of multiword expressions , 2010, Lang. Resour. Evaluation.

[8] Steven Bird,et al. NLTK: The Natural Language Toolkit , 2002, ACL.

[9] M. T. Lino,et al. Proceedings of the 4th International Conference on Language Resources and Evaluation , 2004 .

[10] Tomaz Erjavec,et al. The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages , 2006, LREC.

[11] Mauro Cettolo,et al. Efficient Handling of N-gram Language Models for Statistical Machine Translation , 2007, WMT@ACL.

[12] Keh-Jiann Chen,et al. Acquiring Translation Equivalences of Multiword Expressions by Normalized Correlation Frequencies , 2009, EMNLP.

[13] Adam Kilgarriff,et al. Large Linguistically-Processed Web Corpora for Multiple Languages , 2006, EACL.

[14] Björn Gambäck,et al. Disambiguating Word Translations with Target Language Models , 2012, TSD.

[15] Reinhard Rapp,et al. Automatic Identification of Word Translations from Unrelated English and German Corpora , 1999, ACL.

[16] Serge Sharoff,et al. Using collocations from comparable corpora to find translation equivalents , 2006, LREC.

[17] Björn Gambäck,et al. Word Translation Disambiguation without Parallel Texts ∗ , 2011 .

[18] George Tambouratzis,et al. Implementing a Language-Independent MT Methodology , 2012 .

[19] Qun Liu,et al. Improving Statistical Machine Translation Using Domain Bilingual Multiword Expressions , 2009, MWE@IJCNLP.

[20] Pierre Zweigenbaum,et al. Looking for Candidate Translational Equivalents in Specialized, Comparable Corpora , 2002, COLING.