论文信息 - Construction of a Benchmark Data Set for Cross-lingual Word Sense Disambiguation

Construction of a Benchmark Data Set for Cross-lingual Word Sense Disambiguation

Given the recent trend to evaluate the performance of word sense disambiguation systems in a more application-oriented set-up, we report on the construction of a multilingual benchmark data set for cross-lingual word sense disambiguation. The data set was created for a lexical sample of 25 English nouns, for which translations were retrieved in 5 languages, namely Dutch, German, French, Italian and Spanish. The corpus underlying the sense inventory was the parallel data set Europarl. The gold standard sense inventory was based on the automatic word alignments of the parallel corpus, which were manually verified. The resulting word alignments were used to perform a manual clustering of the translations over all languages in the parallel corpus. The inventory then served as input for the annotators of the sentences, who were asked to provide a maximum of three contextually relevant translations per language for a given focus word. The data set was released in the framework of the SemEval-2010 competition.

Véronique Hoste | Els Lefever

[1] Marianna Apidianaki,et al. Data-Driven Semantic Analysis for Multilingual WSD and Lexical Selection in Translation , 2009, EACL.

[2] Christiane Fellbaum,et al. Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[3] Hwee Tou Ng,et al. Exploiting Parallel Texts for Word Sense Disambiguation: An Empirical Study , 2003, ACL.

[4] Véronique Hoste,et al. SemEval-2010 Task 3: Cross-Lingual Word Sense Disambiguation , 2010, SemEval@ACL.

[5] Rada Mihalcea,et al. SemEval-2010 Task 2: Cross-Lingual Lexical Substitution , 2009, SemEval@ACL.

[6] Philipp Koehn,et al. Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[7] Nancy Ide,et al. Fine-Grained Word Sense Disambiguation Based on Parallel Corpora, Word Alignment, Word Clustering and Aligned Wordnets , 2004, COLING.

[8] David Yarowsky,et al. A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[9] Roberto Navigli,et al. Word sense disambiguation: A survey , 2009, CSUR.

[10] Hermann Ney,et al. A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[11] B. T. S. Atkins,et al. Building a Lexicon The Contribution of Lexicography , 1991 .

[12] Nancy Ide,et al. Sense Discrimination with Parallel Corpora , 2002, SENSEVAL.

[13] Kenneth Ward Church,et al. A Program for Aligning Sentences in Bilingual Corpora , 1993, CL.