Gloss-Based Semantic Similarity Metrics for Predominant Sense Acquisition

In recent years there have been various approaches aimed at automatic acquisition of predominant senses of words. This information can be exploited as a powerful backoff strategy for word sense disambiguation given the zipfian distribution of word senses. Approaches which do not require manually sense-tagged data have been proposed for English exploiting lexical resources available, notably WordNet. In these approaches distributional similarity is coupled with a semantic similarity measure which ties the distributionally related words to the sense inventory. The semantic similarity measures that have been used have all taken advantage of the hierarchical information in WordNet. We investigate the applicability to Japanese and demonstrate the feasibility of a measure which uses only information in the dictionary definitions, in contrast with previous work on English which uses hierarchical information in addition to dictionary definitions. We extend the definition based semantic similarity measure with distributional similarity applied to the words in different definitions. This increases the recall of our method and in some cases, precision as well.

[1]  Ted Pedersen,et al.  An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[2]  Graeme Hirst,et al.  Determining Word Sense Dominance Using a Thesaurus , 2006, EACL.

[3]  Michael E. Lesk,et al.  Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[4]  Kiyoaki Shirai SENSEVAL-2 Japanese Dictionary Task , 2001, SENSEVAL@ACL.

[5]  Ted Pedersen,et al.  The cpan wordnet::similarity package , 2003 .

[6]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[7]  Martha Palmer,et al.  The English all-words task , 2004, SENSEVAL@ACL.

[8]  Eneko Agirre,et al.  Meaningful Results for Information Retrieval in the MEANING Project , 2006 .

[9]  Mirella Lapata,et al.  Ensemble Methods for Unsupervised WSD , 2006, ACL.

[10]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[11]  Hwee Tou Ng,et al.  Word Sense Disambiguation Improves Statistical Machine Translation , 2007, ACL.

[12]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[13]  Hwee Tou Ng,et al.  Domain Adaptation with Active Learning for Word Sense Disambiguation , 2007, ACL.

[14]  Yorick Wilks,et al.  Making Sense About Sense , 2007 .

[15]  Roberto Navigli,et al.  SemEval-2007 Task 07: Coarse-Grained English All-Words Task , 2007, Fourth International Workshop on Semantic Evaluations (SemEval-2007).

[16]  Marine Carpuat,et al.  Improving Statistical Machine Translation Using Word Sense Disambiguation , 2007, EMNLP.

[17]  Andrew Hickl,et al.  A Discourse Commitment-Based Framework for Recognizing Textual Entailment , 2007, ACL-PASCAL@ACL.

[18]  Diana McCarthy,et al.  Domain-Speci(cid:12)c Sense Distributions and Predominant Sense Acquisition , 2022 .

[19]  Kathleen McKeown,et al.  Improving Word Sense Disambiguation in Lexical Chaining , 2003, IJCAI.

[20]  George A. Miller,et al.  A Semantic Concordance , 1993, HLT.

[21]  Hwee Tou Ng,et al.  Word Sense Disambiguation with Distribution Estimation , 2005, IJCAI.

[22]  Yuji Matsumoto,et al.  Japanese Dependency Analysis using Cascaded Chunking , 2002, CoNLL.

[23]  Rada Mihalcea,et al.  Unsupervised Large-Vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data Labeling , 2005, HLT.