Words, Concepts and Relations in the Construction of Polish WordNet

A Polish WordNet has been under construction for two years. We discuss the organisation of the project, the fundamental assumptions, the tools and the resources. We show how our work di ers from that done on EuroWordNet and BalkaNet. In a year we expect the network to reach 20000 lexical units. Some 12000 entries will have been completed by hand. Work on others will be automated as far as possible; to that end, we have developed statistics-based semantic similarity functions and methods based on a form of chunking. The preliminary results show that at least semi-automated acquisition of relations is feasible, so that the lexicographers' work may be reduced to revision and approval. 1 Organisation of the project Ever since the initial burst of popularity of the original WordNet [1, 2], there has been little doubt how useful wordnets are in Natural Language Processing. For those who work with a language that lacks a wordnet, the question is not whether, but how and how fast to construct such a lexical resource. The construction is costly, with the bulk of the cost due to the high linguistic workload. This appears to have been the case, in particular, in two multinational wordnetbuilding projects, EuroWordNet [3] and BalkaNet [4]. The recent developments in automatic acquisition of lexical-semantic relations suggest that the cost might be reduced. Our project to construct a Polish WordNet (plWordNet) explores this path as a supplement to a well-organized and well-supported e ort of a team of linguists/lexicographers. The three-year project started in November 2005. The Polish Ministry of Education and Science funds it with a very modest ca. 65000 euro (net). The stated main objective is the development of algorithms of automatic acquisition ? Work nanced by the Polish Ministry of Education and Science, Project No. 3 T11C 018 29.

[1]  D. Tufis,et al.  BalkaNet : Aims , Methods , Results and Perspectives . A General Overview , 2004 .

[2]  George A. Miller,et al.  WordNet: A Lexical Database for the English Language , 2002 .

[3]  Aleš Horák,et al.  New Features of Wordnet Editor VisDic , 2004 .

[4]  Aleš Horák,et al.  DEBVisDic - First Version of New Client-Server Wordnet Browsing and Editing Tool , 2005 .

[5]  Piek Vossen,et al.  EuroWordNet: general document , 2002 .

[6]  Maciej Piasecki,et al.  Recognition of Structured Collocations in An Inflective Language , 2008 .

[7]  Maciej Piasecki,et al.  Environment Supporting Construction of the Polish Wordnet , 2007 .

[8]  Stan Szpakowicz,et al.  Automatic Selection of Heterogeneous Syntactic Features in Semantic Similarity of Polish Nouns , 2007, TSD.

[9]  Karel Pala,et al.  Building Czech Wordnet , 2004 .

[10]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[11]  R J Donaldson,et al.  A General Overview , 1980, Royal Society of Health journal.

[12]  Edmond Chow,et al.  New Experiments in Distributional Representations of Synonymy , 2005, CoNLL.

[13]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[14]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[15]  Tinko TINCHEV Bulgarian Wordnet – Structure and Validation Svetla KOEVA , Stoyan MIHOV , 2004 .

[16]  Marti A. Hearst Automatic Acquisition of Hyponyms from Large Text Corpora , 1992, COLING.

[17]  Maciej Piasecki,et al.  Polish WordNet on a Shoestring , 2007 .

[18]  Helmut Feldweg,et al.  GermaNet - a Lexical-Semantic Net for German , 1997 .

[19]  Maciej Piasecki,et al.  Extended Similarity Test for the Evaluation of Semantic SimilarityFunctions , 2007 .

[20]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .