Corpus-based Semantic Relatedness for the Construction of Polish WordNet

The construction of a wordnet, a labour-intensive enterprise, can be significantly assisted by automatic grouping of lexical material and discovery of lexical semantic relations. The objective is to ensure high quality of automatically acquired results before they are presented for lexicographers’ approval. We discuss a software tool that suggests synset members using a measure of semantic relatedness with a given verb or adjective; this extends previous work on nominal synsets in Polish WordNet. Syntactically-motivated constraints are deployed on a large morphologically annotated corpus of Polish. Evaluation has been performed via the WordNet-Based Similarity Test and additionally supported by human raters. A lexicographer also manually assessed a suitable sample of suggestions. The results compare favourably with other known methods of acquiring semantic relations.

[1]  J. Morse Determining Sample Size , 2000 .

[2]  Ido Dagan,et al.  Feature Vector Quality and Distributional Similarity , 2004, COLING.

[3]  Gemma Boleda,et al.  Acquisition of Semantic Classes for Adjectives from Distributional Evidence , 2004, COLING.

[4]  Maciej Piasecki,et al.  Extended Similarity Test for the Evaluation of Semantic SimilarityFunctions , 2007 .

[5]  Stan Szpakowicz,et al.  Automatic Selection of Heterogeneous Syntactic Features in Semantic Similarity of Polish Nouns , 2007, TSD.

[6]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[7]  Gerda Ruge,et al.  Experiments on Linguistically-Based Term Associations , 1992, Inf. Process. Manag..

[8]  David J. Weir,et al.  Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity , 2005, CL.

[9]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[10]  Maciej Piasecki,et al.  Polish WordNet on a Shoestring , 2007 .

[11]  Vasileios Hatzivassiloglou,et al.  Towards the Automatic Identification of Adjectival Scales: Clustering Adjectives According to Meaning , 1993, ACL.

[12]  Maria Leonor Pacheco,et al.  of the Association for Computational Linguistics: , 2001 .

[13]  Gemma Boleda,et al.  Morphology vs. Syntax in Adjective Class Acquisition , 2005, ACL 2005.

[14]  E. Fess,et al.  Determining sample size. , 1995, Journal of hand therapy : official journal of the American Society of Hand Therapists.

[15]  Edmond Chow,et al.  New Experiments in Distributional Representations of Synonymy , 2005, CoNLL.

[16]  Thomas G. Dietterich Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms , 1998, Neural Computation.

[17]  Maria Lapata,et al.  A Corpus-based Account of Regular Polysemy: The Case of Context-sensitive Adjectives , 2001, NAACL.

[18]  Marcin Junczys-Dowmunt,et al.  Proceedings of 3rd Language and Technology Conference , 2007 .

[19]  Maciej Piasecki,et al.  Words, Concepts and Relations in the Construction of Polish WordNet , 2008 .

[20]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[21]  Dekang Lin,et al.  Using Syntactic Dependency as Local Context to Resolve Word Sense Ambiguity , 1997, ACL.