A Reexamination of MRD-Based Word Sense Disambiguation

This article reconsiders the task of MRD-based word sense disambiguation, in extending the basic Lesk algorithm to investigate the impact on WSD performance of different tokenization schemes and methods of definition extension. In experimentation over the Hinoki Sensebank and the Japanese Senseval-2 dictionary task, we demonstrate that sense-sensitive definition extension over hyponyms, hypernyms, and synonyms, combined with definition extension and word tokenization leads to WSD accuracy above both unsupervised and supervised baselines. In doing so, we demonstrate the utility of ontology induction and establish new opportunities for the development of baseline unsupervised WSD methods.

[1]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[2]  Ted Pedersen,et al.  Extended Gloss Overlaps as a Measure of Semantic Relatedness , 2003, IJCAI.

[3]  Timothy Baldwin,et al.  Dictionary Alignment for Context-sensitive Word Glossing , 2007, ALTA.

[4]  Rada Mihalcea,et al.  Bootstrapping Large Sense Tagged Corpora , 2002, LREC.

[5]  George A. Miller,et al.  WordNet 2 - A Morphologically and Semantically Enhanced Resource , 1999 .

[6]  Adam Kilgarriff,et al.  The Senseval-3 English lexical sample task , 2004, SENSEVAL@ACL.

[7]  Adam Kilgarriff,et al.  English Senseval: Report and Results , 2000, LREC.

[8]  Timothy Baldwin,et al.  Improving Parsing and PP Attachment Performance with Sense Information , 2008, ACL.

[9]  Philip Resnik,et al.  An Unsupervised Method for Word Sense Tagging using Parallel Corpora , 2002, ACL.

[10]  George A. Miller,et al.  Using Corpus Statistics and WordNet Relations for Sense Identification , 1998, CL.

[11]  Rada Mihalcea,et al.  Open Mind Word Expert: Creating Large Annotated Data Collections with Web Users’ Help , 2003, LINC@EACL.

[12]  Martha Palmer,et al.  SemEval-2007 Task-17: English Lexical Sample, SRL and All Words , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[13]  Hwee Tou Ng,et al.  Scaling Up Word Sense Disambiguation via Parallel Texts , 2005, AAAI.

[14]  Eric Nichols,et al.  The Hinoki Treebank A Treebank for Text Understanding , 2004, IJCNLP.

[15]  Francis Bond,et al.  The Hinoki Sensebank — A Large-Scale Word Sense Tagged Corpus of Japanese — , 2006 .

[16]  Timothy Baldwin Low-cost, High-Performance Translation Retrieval: Dumber is Better , 2001, ACL.

[17]  Eric Nichols,et al.  Robust Ontology Acquisition from Machine-Readable Dictionaries , 2005, IJCAI.

[18]  Nancy Ide,et al.  Introduction to the Special Issue on Word Sense Disambiguation: The State of the Art , 1998, Comput. Linguistics.

[19]  Sadao KUROHASHI,et al.  SENSEVAL-2 Japanese Tasks , 2001 .

[20]  W. Bruce Croft,et al.  A comparison of indexing techniques for Japanese text retrieval , 1993, SIGIR.

[21]  Eneko Agirre,et al.  Unsupervised WSD based on Automatically Retrieved Examples: The Importance of Bias , 2004, EMNLP.

[22]  Dong-Hong Ji,et al.  Word Sense Disambiguation Using Label Propagation Based Semi-Supervised Learning , 2005, ACL.

[23]  Timothy Baldwin,et al.  Word Sense Disambiguation Incorporating Lexical and Structural Semantic Information , 2007, EMNLP.

[24]  Yuji Matsumoto,et al.  Japanese Morphological Analysis System ChaSen version 2.0 Manual , 1999 .

[25]  Eric Nichols,et al.  Acquiring an Ontology for a Fundamental Vocabulary , 2004, COLING.

[26]  Richard D. Braatz,et al.  Knowledge-based Methods , 2001 .

[27]  Eneko Agirre,et al.  Combining Unsupervised Lexical Knowledge Methods for Word Sense Disambiguation , 1997, ACL.

[28]  N. H. Beebe A Complete Bibliography of ACM Transactions on Asian Language Information Processing , 2007 .

[29]  Julie Weeds,et al.  Unsupervised Acquisition of Predominant Word Senses , 2007, CL.

[30]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[31]  Adam Kilgarriff,et al.  How Dominant Is the Commonest Sense of a Word? , 2004, TSD.

[32]  Francis Bond,et al.  The Hinoki syntactic and semantic treebank of Japanese , 2007, Lang. Resour. Evaluation.

[33]  Kiyoaki Shirai Construction of a Word Sense Tagged Corpus for SENSEVAL-2 Japanese Dictionary Task , 2002, LREC.

[34]  Stephan Oepen,et al.  Exploiting Semantic Information for HPSG Parse Selection , 2007, ACL 2007.

[35]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[36]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology) , 2006 .

[37]  Hwee Tou Ng,et al.  Word Sense Disambiguation with Semi-Supervised Learning , 2005, AAAI.

[38]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[39]  Alon Itai,et al.  Word Sense Disambiguation Using a Second Language Monolingual Corpus , 1994, CL.

[40]  Timothy Baldwin,et al.  MRD-based Word Sense Disambiguation: Further Extending Lesk , 2008, IJCNLP.

[41]  Eneko Agirre,et al.  Word Sense Disambiguation: Algorithms and Applications , 2007 .

[42]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.