Evaluating knowledge-poor and knowledge-rich features in automatic classification: A case study in WSD

Word Sense Disambiguation (WSD) is a fundamental task in many Computational Linguistics applications. It consists of automatically identifying the sense of ambiguous words in context using computational methods. This work evaluates the automatic disambiguation performance of five machine learning classifiers: Naive Bayes, Support Vector Machines, Decision Trees, KStar and Maximum Entropy. For the classification we compare the performance of these algorithms using knowledge-rich and knowledge-poor features applied to Portuguese data.

[1]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[2]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[3]  Helmut Schmidt,et al.  Probabilistic part-of-speech tagging using decision trees , 1994 .

[4]  Averil Coxhead A New Academic Word List , 2000 .

[5]  Jorge Baptista,et al.  P-AWL: Academic Word List for Portuguese , 2010, PROPOR.

[6]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[7]  Eric Brill,et al.  Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging , 1995, CL.

[8]  Petr Sgall,et al.  Graeme Hirst. Semantic interpretation and the resolution of ambiguity , 1989 .

[9]  Diana McCarthy,et al.  Text Categorization for Improved Priors of Word Meaning , 2007, CICLing.

[10]  Mirella Lapata,et al.  Graph Connectivity Measures for Unsupervised Word Sense Disambiguation , 2007, IJCAI.

[11]  Graeme Hirst,et al.  Semantic Interpretation and the Resolution of Ambiguity , 1987, Studies in natural language processing.

[12]  Maxine Eskénazi,et al.  Word Sense Disambiguation for Vocabulary Learning , 2008, Intelligent Tutoring Systems.

[13]  David Yarowsky,et al.  Word Sense Disambiguation , 2010, Handbook of Natural Language Processing.

[14]  Mark Stevenson,et al.  Word sense disambiguation , 2002 .

[15]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[16]  Mark Stevenson,et al.  Scaling up WSD with Automatically Generated Examples , 2012, BioNLP@HLT-NAACL.

[17]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[18]  W. A. Martin,et al.  Parsing , 1980, ACL.

[19]  David Yarowsky,et al.  Homograph Disambiguation in Text-to-Speech Synthesis , 1997 .

[20]  Mark Stevenson Word sense disambiguation : the case for combinations of knowledge sources , 2003 .

[21]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[22]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .

[23]  Lucia Specia,et al.  Uma abordagem híbrida relacional para a desambiguação lexical de sentido na tradução automática , 2007 .

[24]  Marcos Eduardo Zampieri de Marco A supervised machine learning method for word sense disambiguation of Portuguese nouns , 2010 .

[25]  John G. Cleary,et al.  K*: An Instance-based Learner Using and Entropic Distance Measure , 1995, ICML.

[26]  Ian Witten,et al.  Data Mining , 2000 .