Multilingual versus monolingual word sense disambiguation

This article describes two different word sense disambiguation (WSD) systems, one applicable to parallel corpora and requiring aligned wordnets and the other one, knowledge poorer, albeit more relevant for real applications, relying on unsupervised learning methods and only monolingual data (text and wordnet). Comparing performances of word sense disambiguation systems is a very difficult evaluation task when different sense inventories are used and even more difficult when the sense distinctions are not of the same granularity. However, as we used the same sense inventory, the performance of the two WSD systems can be objectively compared and we bring evidence that multilingual WSD is more precise than monolingual WSD.

[1]  Bernardo Magnini,et al.  Integrating Subject Field Codes into WordNet , 2000, LREC.

[2]  George A. Miller,et al.  A Semantic Concordance , 1993, HLT.

[3]  Kenneth Ward Church,et al.  Using Suffix Arrays to Compute Term Frequency and Document Frequency for All Substrings in a Corpus , 2001, Computational Linguistics.

[4]  Jean Véronis,et al.  Parallel Text Processing , 2000 .

[5]  Dan Tufis Tiered Tagging and Combined Language Models Classifiers , 1999, TSD.

[6]  Igor Mel’čuk,et al.  Dependency Syntax: Theory and Practice , 1987 .

[7]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[8]  Laurent Romary,et al.  CES/XML : An XML-based Standard for Linguistic Corpora , 2000 .

[9]  Vasileios Hatzivassiloglou,et al.  Translating Collocations for Bilingual Lexicons: A Statistical Approach , 1996, CL.

[10]  Deniz Yuret,et al.  Discovery of linguistic relations using lexical attraction , 1998, ArXiv.

[11]  R J Donaldson,et al.  A General Overview , 1980, Royal Society of Health journal.

[12]  Dan Tufis,et al.  Evaluating the Word Sense Disambiguation Accuracy with Three Different Sense Inventories , 2005, NLUCS.

[13]  Graeme Hirst,et al.  Semantic distance in WordNet: An experimental, application-oriented evaluation of five measures , 2004 .

[14]  Adam Pease,et al.  Linking Lixicons and Ontologies: Mapping WordNet to the Suggested Upper Merged Ontology , 2003, IKE.

[15]  D. Tufis,et al.  BalkaNet : Aims , Methods , Results and Perspectives . A General Overview , 2004 .

[16]  Yorick Wilks,et al.  The Interaction of Knowledge Sources in Word Sense Disambiguation , 2001, CL.

[17]  Robert C. Moore On Log-Likelihood-Ratios and the Significance of Rare Events , 2004, EMNLP.

[18]  Makoto Nagao,et al.  General Word Sense Disambiguation Method Based on a Full Sentential Context , 1998, WordNet@ACL/COLING.

[19]  Van Nostrand,et al.  Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm , 1967 .

[20]  Nancy Ide,et al.  Sense Discrimination with Parallel Corpora , 2002, SENSEVAL.

[21]  C. Fellbaum An Electronic Lexical Database , 1998 .

[22]  Dan Tufis A Cheap and Fast Way to Build Useful Translation Lexicons , 2002, COLING.

[23]  Adam Pease,et al.  Towards a standard upper ontology , 2001, FOIS.

[24]  Mark Stevenson,et al.  Introduction to the special issue on word sense disambiguation , 2004, Comput. Speech Lang..

[25]  Dan Tufis,et al.  Improved Lexical Alignment by Combining Multiple Reified Alignments , 2006, EACL.

[26]  Mitchell Marcus,et al.  Empirical Methods for Exploiting Parallel Texts , 2001 .

[27]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[28]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[29]  David Yarowsky,et al.  One Sense per Collocation , 1993, HLT.

[30]  Dan Tufis,et al.  RACAI: Meaning Affinity Models , 2007, SemEval@ACL.

[31]  Dan Tufis,et al.  RoCo-News: A Hand Validated Journalistic Corpus of Romanian , 2006, LREC.

[32]  Dekang Lin,et al.  Using Syntactic Dependency as Local Context to Resolve Word Sense Ambiguity , 1997, ACL.

[33]  Magnus Merkel,et al.  A knowledge-lite approach to word alignment , 2000 .

[34]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[35]  Nancy Ide,et al.  XCES: An XML-based Encoding Standard for Linguistic Corpora , 2000, LREC.

[36]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[37]  Verginica Barbu Mititelu,et al.  Constrained Lexical Attraction Models , 2006, FLAIRS.