论文信息 - Gloss-Based Semantic Similarity Metrics for Predominant Sense Acquisition

Gloss-Based Semantic Similarity Metrics for Predominant Sense Acquisition

In recent years there have been various approaches aimed at automatic acquisition of predominant senses of words. This information can be exploited as a powerful backoff strategy for word sense disambiguation given the zipfian distribution of word senses. Approaches which do not require manually sense-tagged data have been proposed for English exploiting lexical resources available, notably WordNet. In these approaches distributional similarity is coupled with a semantic similarity measure which ties the distributionally related words to the sense inventory. The semantic similarity measures that have been used have all taken advantage of the hierarchical information in WordNet. We investigate the applicability to Japanese and demonstrate the feasibility of a measure which uses only information in the dictionary definitions, in contrast with previous work on English which uses hierarchical information in addition to dictionary definitions. We extend the definition based semantic similarity measure with distributional similarity applied to the words in different definitions. This increases the recall of our method and in some cases, precision as well.

Ryu Iida | Diana McCarthy | Rob Koeling

[1] Ted Pedersen,et al. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet , 2002, CICLing.

[2] Graeme Hirst,et al. Determining Word Sense Dominance Using a Thesaurus , 2006, EACL.

[3] Michael E. Lesk,et al. Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone , 1986, SIGDOC '86.

[4] Kiyoaki Shirai. SENSEVAL-2 Japanese Dictionary Task , 2001, SENSEVAL@ACL.

[5] Ted Pedersen,et al. The cpan wordnet::similarity package , 2003 .

[6] Dekang Lin,et al. Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[7] Martha Palmer,et al. The English all-words task , 2004, SENSEVAL@ACL.

[8] Eneko Agirre,et al. Meaningful Results for Information Retrieval in the MEANING Project , 2006 .

[9] Mirella Lapata,et al. Ensemble Methods for Unsupervised WSD , 2006, ACL.

[10] Carlo Strapparava,et al. Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[11] Hwee Tou Ng,et al. Word Sense Disambiguation Improves Statistical Machine Translation , 2007, ACL.