An investigation into feature construction to assist word sense disambiguation

Identifying the correct sense of a word in context is crucial for many tasks in natural language processing (machine translation is an example). State-of-the art methods for Word Sense Disambiguation (WSD) build models using hand-crafted features that usually capturing shallow linguistic information. Complex background knowledge, such as semantic relationships, are typically either not used, or used in specialised manner, due to the limitations of the feature-based modelling techniques used. On the other hand, empirical results from the use of Inductive Logic Programming (ILP) systems have repeatedly shown that they can use diverse sources of background knowledge when constructing models. In this paper, we investigate whether this ability of ILP systems could be used to improve the predictive accuracy of models for WSD. Specifically, we examine the use of a general-purpose ILP system as a method to construct a set of features using semantic, syntactic and lexical information. This feature-set is then used by a common modelling technique in the field (a support vector machine) to construct a classifier for predicting the sense of a word. In our investigation we examine one-shot and incremental approaches to feature-set construction applied to monolingual and bilingual WSD tasks. The monolingual tasks use 32 verbs and 85 verbs and nouns (in English) from the SENSEVAL-3 and SemEval-2007 benchmarks; while the bilingual WSD task consists of 7 highly ambiguous verbs in translating from English to Portuguese. The results are encouraging: the ILP-assisted models show substantial improvements over those that simply use shallow features. In addition, incremental feature-set construction appears to identify smaller and better sets of features. Taken together, the results suggest that the use of ILP with diverse sources of background knowledge provide a way for making substantial progress in the field of WSD.

[1]  Yehoshua Bar-Hillel,et al.  The Present Status of Automatic Translation of Languages , 1960, Adv. Comput..

[2]  Susan McRoy,et al.  Using Multiple Knowledge Sources for Word Sense Discrimination , 1992, Comput. Linguistics.

[3]  Ashwin Srinivasan,et al.  Word Sense Disambiguation Using Inductive Logic Programming , 2007, ILP.

[4]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[5]  Silvio Ceccato Automatic translation of languages , 1964, Inf. Storage Retr..

[6]  Ron Kohavi,et al.  Automatic Parameter Selection by Minimizing Estimated Error , 1995, ICML.

[7]  Ron Kohavi,et al.  Irrelevant Features and the Subset Selection Problem , 1994, ICML.

[8]  Saso Dzeroski,et al.  Learning Nonrecursive Definitions of Relations with LINUS , 1991, EWSL.

[9]  Alan F. Newell,et al.  The rôle of natural language processing in alternative and augmentative communication , 1998, Natural Language Engineering.

[10]  Christopher D. Manning,et al.  Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger , 2000, EMNLP.

[11]  Ashwin Srinivasan,et al.  Randomised restarted search in ILP , 2006, Machine Learning.

[12]  Yee Whye Teh,et al.  NUS-ML: Improving Word Sense Disambiguation Using Topic Features , 2007, SemEval@ACL.

[13]  Yorick Wilks,et al.  The grammar of sense: Using part-of-speech tags as a first step in semantic disambiguation , 1998, Natural Language Engineering.

[14]  Yorick Wilks,et al.  The Interaction of Knowledge Sources in Word Sense Disambiguation , 2001, CL.

[15]  Stephen Muggleton,et al.  Support Vector Inductive Logic Programming , 2005, Discovery Science.

[16]  Mark Stevenson,et al.  Combining independent knowledge sources for word sense disambiguation , 2000 .

[17]  Dekang Lin,et al.  Principle-Based Parsing Without Overgeneration , 1993, ACL.

[18]  Massimiliano Ciaramita,et al.  Multi-component Word Sense Disambiguation , 2004, SENSEVAL@ACL.

[19]  Shan-Hwei Nienhuys-Cheng,et al.  Foundations of Inductive Logic Programming , 1997, Lecture Notes in Computer Science.

[20]  Oliver Ray,et al.  10th International Conference on Discovery Science , 2007 .

[21]  Paul Procter,et al.  Longman Dictionary of Contemporary English , 1978 .

[22]  Ted Pedersen,et al.  Complementarity of lexical and simple syntactic features: The SyntaLex approach to Senseval-3 , 2004, SENSEVAL@ACL.

[23]  Adam Kilgarriff,et al.  The Senseval-3 English lexical sample task , 2004, SENSEVAL@ACL.

[24]  David J. Hand,et al.  Construction and Assessment of Classification Rules , 1997 .

[25]  Peter A. Flach,et al.  Propositionalization approaches to relational data mining , 2001 .

[26]  Eneko Agirre,et al.  UBC-ALM: Combining k-NN with SVD for WSD , 2007, SemEval@ACL.

[27]  J. Alexander,et al.  Theory and Methods: Critical Essays in Human Geography , 2008 .

[28]  Mitchell P. Marcus,et al.  OntoNotes: The 90% Solution , 2006, NAACL.

[29]  Dong-Hong Ji,et al.  I2R: Three Systems for Word Sense Discrimination, Chinese Word Sense Disambiguation, and English Word Sense Disambiguation , 2007, SemEval@ACL.

[30]  Lucia Specia,et al.  A Hybrid Relational Approach for WSD – First Results , 2006, ACL.

[31]  Lucia Specia,et al.  Exploiting parallel texts to produce a multilingual sense tagged corpus for word sense disambiguation , 2007 .

[32]  Luc De Raedt,et al.  kFOIL: Learning Simple Relational Kernels , 2006, AAAI.

[33]  Ted Pedersen A Baseline Methodology for Word Sense Disambiguation , 2002, CICLing.

[34]  Graeme Hirst,et al.  Semantic interpretation and the resolution of ambiguity: (studies in natural language processing) , 1992 .

[35]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[36]  Ashwin Srinivasan,et al.  ILP Through Propositionalization and Stochastic k-Term DNF Learning , 2007, ILP.

[37]  Stephen Muggleton,et al.  Support vector inductive logic programming outperforms the naive Bayes classifier and inductive logic programming for the classification of bioactive chemical compounds , 2007, J. Comput. Aided Mol. Des..

[38]  Andy Zuppann,et al.  A Connectionist Approach to Word Sense Disambiguation , 2003 .

[39]  Lucia Specia,et al.  Learning Expressive Models for Word Sense Disambiguation , 2007, ACL.

[40]  Martha Palmer,et al.  SemEval-2007 Task-17: English Lexical Sample, SRL and All Words , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[41]  Susan Bonzi,et al.  Semantic interpretation and the resolution of ambiguity , 1989, JASIS.

[42]  Hinrich Schütze,et al.  Automatic Word Sense Discrimination , 1998, Comput. Linguistics.

[43]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[44]  S. Siegel,et al.  Nonparametric statistics for the behavioral sciences / Sidney Siegel , 1956 .

[45]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[46]  Nada Lavrac,et al.  Propositionalization-based relational subgroup discovery with RSD , 2006, Machine Learning.

[47]  Jesse Davis,et al.  Change of Representation for Statistical Relational Learning , 2007, IJCAI.

[48]  Eneko Agirre,et al.  Word Sense Disambiguation using Conceptual Density , 1996, COLING.

[49]  Leila Kosseim,et al.  Simple features for statistical Word Sense Disambiguation , 2004, SENSEVAL@ACL.

[50]  George A. Miller,et al.  Introduction to WordNet: An On-line Lexical Database , 1990 .

[51]  Graeme Hirst,et al.  Review of A connectionist approach to word sense disambiguation by Garrison W. Cottrell. Pitman and Morgan Kaufmann 1989. , 1990 .

[52]  Stephen Muggleton Inductive Logic Programming: Derivations, Successes and Shortcomings , 1993, ECML.

[53]  M. Crawford,et al.  Theory and methods , 1980 .

[54]  Ashwin Srinivasan,et al.  USP-IBM-1 and USP-IBM-2: The ILP-based Systems for Lexical Sample WSD in SemEval-2007 , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).