Data-Driven Semantic Analysis for Multilingual WSD and Lexical Selection in Translation

A common way of describing the senses of ambiguous words in multilingual Word Sense Disambiguation (WSD) is by reference to their translation equivalents in another language. The theoretical soundness of the senses induced in this way can, however, be doubted. This type of cross-lingual sense identification has implications for multilingual WSD and MT evaluation as well. In this article, we first present some arguments in favour of a more thorough analysis of the semantic information that may be induced by the equivalents of ambiguous words found in parallel corpora. Then, we present an unsupervised WSD method and a lexical selection method that exploit the results of a data-driven sense induction method. Finally, we show how this automatically acquired information can be exploited for a multilingual WSD and MT evaluation more sensitive to lexical semantics.

[1]  Philip Resnik,et al.  Exploiting Hidden Meanings: Using Bilingual Text for Monolingual Annotation , 2004, CICLing.

[2]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[3]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[4]  Marine Carpuat,et al.  Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[5]  Stelios Piperidis,et al.  Building Parallel Corpora for eContent Professionals , 2004 .

[6]  Andy Way,et al.  Labelled Dependencies in Machine Translation Evaluation , 2007, WMT@ACL.

[7]  ResnikPhilip,et al.  Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation , 1999 .

[8]  Philipp Koehn,et al.  Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.

[9]  Robert L. Mercer,et al.  Word-Sense Disambiguation Using Statistical Methods , 1991, ACL.

[10]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[11]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[12]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[13]  Marine Carpuat,et al.  Word Sense Disambiguation vs. Statistical Machine Translation , 2005, ACL.

[14]  Adam Kilgarriff,et al.  Introduction to the special issue on evaluating word sense disambiguation systems , 2002, Natural Language Engineering.

[15]  Philip Koehn,et al.  Statistical Machine Translation , 2010, EAMT.

[16]  Philipp Koehn,et al.  Europarl: A Parallel Corpus for Statistical Machine Translation , 2005, MTSUMMIT.

[17]  Peng Jin,et al.  SemEval-2007 Task 05: Multilingual Chinese-English Lexical Sample , 2007, SemEval@ACL.

[18]  J. Ivey,et al.  Ann Arbor, Michigan , 1969 .

[19]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with High Levels of Correlation with Human Judgments , 2007, WMT@ACL.

[20]  Alon Lavie,et al.  METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments , 2005, IEEvaluation@ACL.

[21]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[22]  Catherine Fuchs,et al.  Paraphrase et énonciation , 1994 .

[23]  Ted Pedersen,et al.  The Senseval-3 Multilingual English-­Hindi lexical sample task , 2004, SENSEVAL@ACL.

[24]  Marianna Apidianaki Translation-oriented Word Sense Induction Based on Parallel Corpora , 2008, LREC.

[25]  Hwee Tou Ng,et al.  Word Sense Disambiguation Improves Statistical Machine Translation , 2007, ACL.

[26]  Salim Roukos,et al.  Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.

[27]  Daphne Koller,et al.  Word-Sense Disambiguation for Machine Translation , 2005, HLT.

[28]  Philip Resnik,et al.  Using WSD Techniques for Lexical Selection in Statistical Machine Translation , 2005 .