Gathering Information About Word Similarity from Neighbor Sentences

In this paper we present the first results of detecting word semantic similarity on the Russian translations of Miller-Charles and Rubenstein-Goodenough sets prepared for the first Russian word semantic evaluation Russe-2015. The experiments were carried out on three text collections: Russian Wikipedia, a news collection, and their united collection. We found that the best results in detection of lexical paradigmatic relations are achieved using the combination of word2vec with the new type of features based on word co-occurrences in neighbor sentences.

[1]  Stefan Evert,et al.  A Large Scale Evaluation of Distributional Semantic Models: Parameters, Interactions and Model Selection , 2014, TACL.

[2]  Iryna Gurevych,et al.  Using the Structure of a Conceptual Network in Computing Semantic Relatedness , 2005, IJCNLP.

[3]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[4]  Piek T. J. M. Vossen,et al.  What implementation and translation teach us: the case of semantic similarity measures in wordnets , 2014, GWC.

[5]  близости слоВ,et al.  The ImpacT of DIfferenT VecTor Space moDelS anD SupplemenTary TechnIqueS on ruSSIan SemanTIc SImIlarITy TaSk , 2015 .

[6]  Dmitry Ustalov,et al.  RUSSE: The First Workshop on Russian Semantic Similarity , 2015, ArXiv.

[7]  Georgiana Dinu,et al.  Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors , 2014, ACL.

[8]  Graeme Hirst,et al.  Evaluating WordNet-based Measures of Lexical Semantic Relatedness , 2006, CL.

[9]  John B. Goodenough,et al.  Contextual correlates of synonymy , 1965, CACM.

[10]  Andrey Kutuzov,et al.  Comparing Neural Lexical Models of a Classic National Corpus and a Web Corpus: The Case for Russian , 2015, CICLing.

[11]  Regina Barzilay,et al.  Using Lexical Chains for Text Summarization , 1997 .

[12]  Evgeniy Gabrilovich,et al.  Computing Semantic Relatedness Using Wikipedia-based Explicit Semantic Analysis , 2007, IJCAI.

[13]  Eraldo Rezende Fernandes,et al.  Latent Trees for Coreference Resolution , 2014, CL.

[14]  Лопухина Анастасия Александровна,et al.  The Impact of Different Vector Space Models and Supplementary Techniques on Russian Semantic Similarity Task , 2015 .

[15]  G. Miller,et al.  Contextual correlates of semantic similarity , 1991 .

[16]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[17]  Christiane Fellbaum,et al.  Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms , 1998 .

[18]  Michael Halliday,et al.  Cohesion in English , 1976 .

[19]  Eneko Agirre,et al.  A Study on Similarity and Relatedness Using Distributional and WordNet-based Approaches , 2009, NAACL.

[20]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[21]  Rada Mihalcea,et al.  Cross-lingual Semantic Relatedness Using Encyclopedic Knowledge , 2009, EMNLP.

[22]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[23]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..