WORD SENSE CLUSTERING BASED ON TRANSLATION EQUIVALENCE IN PARALLEL TEXTS ; A CASE STUDY IN ROMANIAN

The lexical ambiguity is one of the most difficult problem to solve in the natural language processing. The Word sense disambiguation (wsd) is a task of utmost importance and as such, it is not surprinsing the great interest for this area of research. We argue that the difficulty of solving the problem of sense disambiguation depends to a large extent on the application which requires a solution to this question. For many of the computationally interesting applications of the natural language processing (such as the multilingual intelligent information retrieval, the semantic indexing, the machine translation) the sense distinctions should consider the so-called "strong" senses, namely those that more often than not imply different lexicalisations in another language. We describe our approach based on the concept of translation equivalence and a case-study providing support for claiming that a multilingual approach to wsd is much more reliable and more precise than the traditional monolingual tackling of the problem.

[1]  Martin Kay,et al.  Text-Translation Alignment , 1993, Comput. Linguistics.

[2]  Chris Brew,et al.  Word-Pair Extraction for Lexicography , 1996 .

[3]  Djoerd Hiemstra Deriving a Bilingual Lexicon for Cross-Language Information Retrieval , 1997 .

[4]  Ana-Maria Barbu,et al.  Corpora and Corpus-Based Morpho-Lexical Processing , 1997 .

[5]  Piek Vossen,et al.  EuroWordNet: A multilingual database with lexical semantic networks , 1998, Springer Netherlands.

[6]  Nancy Ide,et al.  Multext-East: Parallel and Comparable Corpora and Lexicons for Six Central and Eastern European Languages , 1998, COLING-ACL.

[7]  Nancy Ide,et al.  Standardised specifications, development and assessment of large morpho-lexical resources for six central and eastern european languages , 1998, LREC.

[8]  P. Resnik,et al.  Creating a Parallel Corpus from the \ Book of 2000 Tongues " , 1998 .

[9]  Dan Tufis Tiered Tagging and Combined Language Models Classifiers , 1999, TSD.

[10]  David Yarowsky,et al.  Taking the load off the conference chairs-towards a digital paper-routing assistant , 1999, EMNLP.

[11]  David Yarowsky,et al.  Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation , 1999, Natural Language Engineering.

[12]  Magnus Merkel,et al.  A knowledge-lite approach to word alignment , 2000 .

[13]  Nancy Ide,et al.  Automatic Sense Tagging Using Parallel Corpora , 2001, NLPRS.

[14]  Stamou Sofia Oflazer,et al.  BALKANET: A Multilingual Semantic Network for Balkan Languages , 2001 .

[15]  Mitchell Marcus,et al.  Empirical Methods for Exploiting Parallel Texts , 2001 .

[16]  Nancy Ide,et al.  Sense Discrimination with Parallel Corpora , 2002, SENSEVAL.

[17]  Dan Tufis,et al.  Revealing Translators' Knowledge: Statistical Methods in Constructing Practical Translation Lexicons for Language and Speech Processing , 2002, Int. J. Speech Technol..

[18]  Philip Resnik,et al.  A Perspective on Word Sense Disambiguation Methods and Their Evaluation , 2002 .

[19]  A. Cuza Methodological issues in building the Romanian Wordnet and consistency checks in Balkanet ' DQ 7 XILú , 2002 .

[20]  Dan Tufis A Cheap and Fast Way to Build Useful Translation Lexicons , 2002, COLING.