Combining Lexical and Syntactic Features for Supervised Word Sense Disambiguation

The success of supervised learning approaches to word sense disambiguation is largely dependent on the features used to represent the context in which an ambiguous word occurs. Previous work has reached mixed conclusions; some suggest that combinations of syntactic and lexical features will perform most effectively. However, others have shown that simple lexical features perform well on their own. This paper evaluates the effect of using different lexical and syntactic features both individually and in combination. We show that it is possible for a very simple ensemble that utilizes a single lexical feature and a sequence of part of speech features to result in disambiguation accuracy that is near state of the art.

[1]  Ted Pedersen,et al.  A Decision Tree of Bigrams is an Accurate Predictor of Word Sense , 2001, NAACL.

[2]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[3]  Philip Resnik,et al.  Using Information Content to Evaluate Semantic Similarity in a Taxonomy , 1995, IJCAI.

[4]  Ted Dunning,et al.  Accurate Methods for the Statistics of Surprise and Coincidence , 1993, CL.

[5]  Adam Kilgarriff,et al.  What is word sense disambiguation good for? , 1997, ArXiv.

[6]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[7]  Michael Collins,et al.  Head-Driven Statistical Models for Natural Language Parsing , 2003, CL.

[8]  Ted Pedersen,et al.  Knowledge Lean Word-Sense Disambiguation , 1997, AAAI/IAAI.

[9]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[10]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[11]  Hwee Tou Ng,et al.  An Empirical Evaluation of Knowledge Sources and Learning Algorithms for Word Sense Disambiguation , 2002, EMNLP.

[12]  John Mingers,et al.  Rule Induction with Statistical Data—A Comparison with Multiple Regression , 1987 .

[13]  Janyce Wiebe,et al.  Word-Sense Disambiguation Using Decomposable Models , 1994, ACL.

[14]  Ellen M. Voorhees,et al.  Corpus-Based Statistical Sense Resolution , 1993, HLT.

[15]  Eric Brill,et al.  A Simple Rule-Based Part of Speech Tagger , 1992, HLT.

[16]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[17]  David Yarowsky,et al.  Hierarchical Decision Lists for Word Sense Disambiguation , 2000, Comput. Humanit..

[18]  Timothy R. C. Read,et al.  Multinomial goodness-of-fit tests , 1984 .

[19]  Ted Pedersen,et al.  A Simple Approach to Building Ensembles of Naive Bayesian Classifiers for Word Sense Disambiguation , 2000, ANLP.

[20]  Hwee Tou Ng,et al.  Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach , 1996, ACL.

[21]  Dan Klein,et al.  Combining Heterogeneous Classifiers for Word Sense Disambiguation , 2001, SENSEVAL@ACL.

[22]  David Yarowsky,et al.  One Sense per Collocation , 1993, HLT.

[23]  Mark Stevenson Extracting Syntactic Relations using Heuristics , 1998 .

[24]  Eric Brill,et al.  An Overview of Empirical Natural Language Processing , 1997, AI Mag..

[25]  Louise Guthrie,et al.  Lexical Disambiguation using Simulated Annealing , 1992, COLING.

[26]  Dan Tufis,et al.  Tagging romanian texts: a case study for QTAG, a language independent probabilistic tagger , 1998 .

[27]  Louise Guthrie,et al.  Disambiguation: a Study in Weighted Preference* , 2022 .

[28]  Ted Pedersen,et al.  Assessing System Agreement and Instance Difficulty in the Lexical , 2002, SENSEVAL.

[29]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[30]  Paul S. Jacobs,et al.  TRUMP: A transportable language understanding program , 1992, Int. J. Intell. Syst..

[31]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[32]  Eric Brill,et al.  Transformation-Based Error-Driven Parsing , 1993, IWPT.

[33]  Paul S. Jacobs Language Analysis in Not-So-Limited Domains , 1986, FJCC.

[34]  David Yarowsky,et al.  Modeling Consensus: Classifier Combination for Word Sense Disambiguation , 2002, EMNLP.

[35]  H. Schütze,et al.  Dimensions of meaning , 1992, Supercomputing '92.

[36]  Ted Pedersen,et al.  Guaranteed Pre-tagging for the Brill Tagger , 2003, CICLing.

[37]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[38]  Ted Pedersen Machine Learning with Lexical Features: The Duluth Approach to SENSEVAL-2 , 2001, SENSEVAL@ACL.

[39]  George A. Miller,et al.  Using a Semantic Concordance for Sense Identification , 1994, HLT.

[40]  George A. Miller,et al.  Using Corpus Statistics and WordNet Relations for Sense Identification , 1998, CL.

[41]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[42]  Yorick Wilks,et al.  Word Sense Disambiguation using Optimised Combinations of Knowledge Sources , 1998, COLING-ACL.

[43]  Adam Kilgarriff,et al.  SENSEVAL: an exercise in evaluating world sense disambiguation programs , 1998, LREC.

[44]  Yorick Wilks,et al.  The Interaction of Knowledge Sources in Word Sense Disambiguation , 2001, CL.

[45]  Ted Pedersen Evaluating the Effectiveness of Ensembles of Decision Trees , 2002, SENSEVAL.

[46]  W McRoySusan Using multiple knowledge sources for word sense discrimination , 1992 .

[47]  Steven Abney,et al.  Part-of-Speech Tagging and Partial Parsing , 1997 .

[48]  Eugene Charniak,et al.  A Maximum-Entropy-Inspired Parser , 2000, ANLP.

[49]  Dekang Lin,et al.  Using Syntactic Dependency as Local Context to Resolve Word Sense Ambiguity , 1997, ACL.

[50]  Michael Collins,et al.  Discriminative Reranking for Natural Language Parsing , 2000, CL.

[51]  Bonnie Lynn Webber,et al.  Natural Language I , 1989, HLT.

[52]  Paul S. Jacobs A Knowledge Framework for Natural Language Analysis , 1987, IJCAI.

[53]  Dekang Lin,et al.  Dependency-Based Evaluation of Minipar , 2003 .

[54]  Ian H. Witten,et al.  WEKA: a machine learning workbench , 1994, Proceedings of ANZIIS '94 - Australian New Zealnd Intelligent Information Systems Conference.

[55]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[56]  Susan McRoy,et al.  Using Multiple Knowledge Sources for Word Sense Discrimination , 1992, Comput. Linguistics.

[57]  Steven P. Abney Partial parsing via finite-state cascades , 1996, Natural Language Engineering.

[58]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.