论文信息 - Sense-based clustering of Polish nouns in the extraction of semantic relatedness

Sense-based clustering of Polish nouns in the extraction of semantic relatedness

The construction of a wordnet from scratch requires intelligent software support. An accurate measure of semantic relatedness can be used to extract groups of semantically close words from a corpus. Such groups help a lexicographer make decisions about synset membership and synset placement in the network. We have adapted to Polish the well-known algorithm of Clustering by Committee, and tested it on the largest Polish corpus available. The evaluation by way of a plWordNet-based synonymy test used Polish WordNet, a resource still under development. The results are consistent with a few benchmarks, but not encouraging enough yet to make a wordnet writer's support tool immediately useful.

Stan Szpakowicz | Maciej Piasecki | Bartosz Broda

[1] Maciej Piasecki,et al. Extended Similarity Test for the Evaluation of Semantic SimilarityFunctions , 2007 .

[2] Patrick Pantel,et al. Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[3] Patrick Pantel,et al. Discovering word senses from text , 2002, KDD.

[4] Baowen Xu,et al. Reasoning within the Extended Fuzzy Description Logics with Restricted Terminological Boxes , 2007 .

[5] Edmond Chow,et al. New Experiments in Distributional Representations of Synonymy , 2005, CoNLL.

[6] George Karypis,et al. CLUTO - A Clustering Toolkit , 2002 .

[7] Hang Li,et al. A Probabilistic Approach to Lexical Semantic Knowledge Acquisition and Structural Disambiguation , 1998, ArXiv.

[8] Patrick Pantel,et al. Clustering by committee , 2003 .

[9] Hitoshi Isahara,et al. Clustering Using Feature Domain Similarity to Discover Word Senses for Adjectives , 2007, International Conference on Semantic Computing (ICSC 2007).

[10] Maciej Piasecki,et al. Words, Concepts and Relations in the Construction of Polish WordNet , 2008 .

[11] Ted Pedersen,et al. Unsupervised Corpus-Based Methods for WSD , 2007 .

[12] Eneko Agirre,et al. Word Sense Disambiguation: Algorithms and Applications , 2007 .

[13] Stan Szpakowicz,et al. Corpus-based Semantic Relatedness for the Construction of Polish WordNet , 2008, LREC.

[14] Eneko Agirre,et al. Word Sense Disambiguation: Algorithms and Applications (Text, Speech and Language Technology) , 2006 .

[15] Stan Szpakowicz,et al. Automatic Selection of Heterogeneous Syntactic Features in Semantic Similarity of Polish Nouns , 2007, TSD.