Automatic Selection of Heterogeneous Syntactic Features in Semantic Similarity of Polish Nouns

We present experiments with a variety of corpus-based measures applied to the problem of constructing semantic similarity functions for Polish nouns. Rich inflection in Polish allows us to acquire useful syntactic features without parsing; morphosyntactic restrictions checked in a large enough window provide sufficiently useful data. A novel feature selection method gives the accuracy of 86% on the WordNet-based synonymy test, an improvement of 5% over the previous results.

[1]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[2]  Patrick Pantel,et al.  Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations , 2006, ACL.

[3]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[4]  Dan I. Moldovan,et al.  Automatic Discovery of Part-Whole Relations , 2006, CL.

[5]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[6]  Jeffrey P. Bigham,et al.  Combining Independent Modules to Solve Multiple-choice Synonym and Analogy Problems , 2003, ArXiv.

[7]  Dekang Lin,et al.  Using Syntactic Dependency as Local Context to Resolve Word Sense Ambiguity , 1997, ACL.

[8]  Michael W. Berry,et al.  Large-Scale Sparse Singular Value Computations , 1992 .

[9]  Maciej Piasecki,et al.  Effective Architecture of the Polish Tagger , 2006, TSD.

[10]  Maciej Piasecki,et al.  Semantic Similarity Measure of Polish Nouns Based on Linguistic Features , 2007, BIS.

[11]  Gregory Grefenstetti,et al.  Evaluation techniques for automatic semantic extraction: comparing syntactic and window based approaches , 1996 .

[12]  Edmond Chow,et al.  New Experiments in Distributional Representations of Synonymy , 2005, CoNLL.

[13]  Dekang Lin,et al.  Automatic Retrieval and Clustering of Similar Words , 1998, ACL.

[14]  Ido Dagan,et al.  Similarity-Based Methods for Word Sense Disambiguation , 1997, ACL.

[15]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[16]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[17]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[18]  Gregory Grefenstette,et al.  Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntactic and Window Based Approaches , 1996 .

[19]  Peter D. Turney Mining the Web for Synonyms: PMI-IR versus LSA on TOEFL , 2001, ECML.

[20]  David J. Weir,et al.  Co-occurrence Retrieval: A Flexible Framework for Lexical Distributional Similarity , 2005, CL.

[21]  Maciej Piasecki,et al.  Polish WordNet on a Shoestring , 2007 .