An Unsupervised Method for Multilingual Word Sense Tagging Using Parallel Corpora

With an increasing number of languages making their way to our desktops everyday via the Internet, researchers have come to realize the lack of linguistic knowledge resources for scarcely represented/studied languages. In an attempt to bootstrap some of the required linguistic resources for some of those languages, this paper presents an unsupervised method for automatic multilingual word sense tagging using parallel corpora. The method is evaluated on the English Brown corpus and its translation into three different languages: French, German and Spanish. A preliminary evaluation of the proposed method yielded results of up to 79% accuracy rate for the English data on 81.8% of the SemCor manually tagged data.

[1]  Carlo Strapparava,et al.  Experiments in Word Domain Disambiguation for Parallel Texts , 2000, ACL 2000.

[2]  Philip Resnik,et al.  Mining the Web for Bilingual Text , 1999, ACL.

[3]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[4]  I. Dan Melamed,et al.  Models of translation equivalence among words , 2000, CL.

[5]  Sara Laviosa Johansson, Stig & Signe Oksefjell, eds. 1998. Corpora and Cross-Linguistic Research: Theory, Method, and Case Studies. , 2000 .

[6]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[7]  George A. Miller,et al.  Using a Semantic Concordance for Sense Identification , 1994, HLT.

[8]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[9]  Philip Resnik,et al.  Selectional Preference and Sense Disambiguation , 1997 .

[10]  Nancy Ide,et al.  © 1999 Kluwer Academic Publishers. Printed in the Netherlands Cross-lingual Sense Determination: Can It Work? , 2022 .

[11]  Alon Itai,et al.  Word Sense Disambiguation Using a Second Language Monolingual Corpus , 1994, CL.

[12]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[13]  Philip Resnik,et al.  Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language , 1999, J. Artif. Intell. Res..

[14]  Janyce Wiebe,et al.  Word-Sense Disambiguation Using Decomposable Models , 1994, ACL.

[15]  R. Burchfield Frequency Analysis of English Usage: Lexicon and Grammar. By W. Nelson Francis and Henry Kučera with the assistance of Andrew W. Mackie. Boston: Houghton Mifflin. 1982. x + 561 , 1985 .

[16]  Hermann Ney,et al.  A Comparison of Alignment Models for Statistical Machine Translation , 2000, COLING.

[17]  ResnikPhilip,et al.  Distinguishing systems and distinguishing senses: new evaluation methods for Word Sense Disambiguation , 1999 .

[18]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .