A New Approach to Word Sense Disambiguation Based on Context Similarity

One of the major issues in the process of machine translation is word sense disambiguation (WSD), which is defined as choosing the correct meaning of a multi-meaning word. Supervised learning methods are usually used to solve this problem. The disambiguation task is carried out using the statistics of the translated documents (as training data) or dual corpora of source and target languages. In this paper we present a supervised learning method for WSD, which is based on Cosine Similarity. As the first step, we extract two sets of features; the set of words that have occurred frequently in the text and the set of words surrounding the ambiguous word. We will present the results of evaluating the proposed schemes and illustrate the effect of weighting strategies proposed. The results are promising compared to the methods existing in the literature. In corpora-based Translation methods translations are generated on the basis of statistical or probabilistic models whose parameters are extracted from the analysis of a bilingual corpus. Statistical translation is based on the study of frequencies of various linguistic units, including words, lexemes, morphemes, letters, etc., in a sample corpus in order to calculate a set of probabilities, so that various linguistic problems such as ambiguity can be solved. In This paper, we present a WSD approach that is based on inner product of vectors algorithm. The proposed scheme is a supervised approach in which sense-tagged data is used to train the classifier.

[1]  Slava M. Katz,et al.  Principled Disambiguation: Discriminating Adjective Senses with Modified Nouns , 1995, CL.

[2]  Ted Pedersen,et al.  A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation , 2000, ANLP.

[3]  David Yarowsky,et al.  DECISION LISTS FOR LEXICAL AMBIGUITY RESOLUTION: Application to Accent Restoration in Spanish and French , 1994, ACL.

[4]  Dan Roth,et al.  A Winnow-Based Approach to Context-Sensitive Spelling Correction , 1998, Machine Learning.

[5]  Lluís Màrquez i Villodre,et al.  Boosting Applied to Word Sense Disambiguation , 2000, ArXiv.

[6]  Francisco Casacuberta,et al.  Statistical Post-Editing of a Rule-Based Machine Translation System , 2009, NAACL.

[7]  Holger Schwenk,et al.  On the Use of Comparable Corpora to Improve SMT performance , 2009, EACL.

[8]  Alon Itai,et al.  Word Sense Disambiguation Using a Second Language Monolingual Corpus , 1994, CL.

[9]  Andreas Nürnberger,et al.  Arabic/English word translation disambiguation using parallel corpora and matching schemes , 2008, EAMT.

[10]  David Yarowsky,et al.  A method for disambiguating word senses in a large corpus , 1992, Comput. Humanit..

[11]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[12]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[13]  Kenneth C. Litkowski Senseval: The CL Research Experience , 2000, Comput. Humanit..

[14]  David Yarowsky,et al.  Word-Sense Disambiguation Using Statistical Models of Roget’s Categories Trained on Large Corpora , 2010, COLING.

[15]  Robert L. Mercer,et al.  Word-Sense Disambiguation Using Statistical Methods , 1991, ACL.

[16]  Eric Brill,et al.  Automatic Rule Acquisition for Spelling Correction , 1997, ICML.

[17]  Kenneth H. Stokoe,et al.  Proceedings of the World Congress on Engineering 2013, WCE 2013 , 2013 .

[18]  Anju Vyas Print , 2003 .

[19]  Tayebeh Mosavi Miangah,et al.  Word Sense Disambiguation Using Target Language Corpus in a Machine Translation System , 2005, Lit. Linguistic Comput..