RUSSE'2018: A Shared Task on Word Sense Induction for the Russian Language

The paper describes the results of the first shared task on word sense induction (WSI) for the Russian language. While similar shared tasks were conducted in the past for some Romance and Germanic languages, we explore the performance of sense induction and disambiguation methods for a Slavic language that shares many features with other Slavic languages, such as rich morphology and virtually free word order. The participants were asked to group contexts of a given word in accordance with its senses that were not provided beforehand. For instance, given a word "bank" and a set of contexts for this word, e.g. "bank is a financial institution that accepts deposits" and "river bank is a slope beside a body of water", a participant was asked to cluster such contexts in the unknown in advance number of clusters corresponding to, in this case, the "company" and the "area" senses of the word "bank". For the purpose of this evaluation campaign, we developed three new evaluation datasets based on sense inventories that have different sense granularity. The contexts in these datasets were sampled from texts of Wikipedia, the academic corpus of Russian, and an explanatory dictionary of Russian. Overall, 18 teams participated in the competition submitting 383 models. Multiple teams managed to substantially outperform competitive state-of-the-art baselines from the previous years based on sense embeddings.

[1]  Adam Kilgarriff,et al.  The Sketch Engine , 2004 .

[2]  Andrey Kutuzov,et al.  Russian word sense induction by clustering averaged word embeddings , 2018, ArXiv.

[3]  Daniel Jurafsky,et al.  Do Multi-Sense Embeddings Improve Natural Language Understanding? , 2015, EMNLP.

[4]  Alexander Panchenko,et al.  How much does a word weigh? Weighting word embeddings for word sense induction , 2018, ArXiv.

[5]  G. Murphy,et al.  The Representation of Polysemous Words , 2001 .

[6]  Christian Biemann,et al.  Making Sense of Word Embeddings , 2016, Rep4NLP@ACL.

[7]  Лопухин Константин Александрович,et al.  Word Sense Induction for Russian: Deep Study and Comparison with Dictionaries , 2017 .

[8]  Ekaterini Klepousniotou The Processing of Lexical Ambiguity: Homonymy and Polysemy in the Mental Lexicon , 2002, Brain and Language.

[9]  comparISon WIth dIctIonarIeS,et al.  Word SenSe InductIon for ruSSIan : deep Study and comparISon WIth dIctIonarIeS , 2017 .

[10]  Eneko Agirre,et al.  Semeval-2007 Task 2 : Evaluating Word Sense Induction and Discrimination , 2007 .

[11]  Лопухина Анастасия Александровна,et al.  Automated Word Sense Frequency Estimation for Russian Nouns , 2018 .

[12]  Olga Mitrofanova,et al.  Disambiguation of Taxonomy Markers in Context: Russian Nouns , 2009, NODALIDA.

[13]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[14]  David Jurgens,et al.  SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses , 2013, SemEval@NAACL-HLT.

[15]  Serge Sharo Creating General-Purpose Corpora Using Automated Search Engine Queries , 2006 .

[16]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[17]  Anton Osokin,et al.  Breaking Sticks and Ambiguities with Adaptive Skip-gram , 2015, AISTATS.

[18]  Andrey Kutuzov,et al.  Texts in, meaning out: neural language models in semantic similarity task for Russian , 2015, ArXiv.

[19]  Christian Biemann,et al.  Human and Machine Judgements for Russian Semantic Relatedness , 2016, AIST.

[20]  Natalia V. Loukachevitch,et al.  Determining the most frequent senses using Russian linguistic ontology RuThes , 2015 .

[21]  M. Cugmas,et al.  On comparing partitions , 2015 .

[22]  Лопухина Анастасия Александровна,et al.  Word Sense Disambiguation for Russian Verbs Using Semantic Vectors and Dictionary Entries , 2016 .

[23]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[24]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[25]  Roberto Navigli,et al.  SemEval-2013 Task 11: Word Sense Induction and Disambiguation within an End-User Application , 2013, SemEval@NAACL-HLT.

[26]  Suresh Manandhar,et al.  SemEval-2010 Task 14: Word Sense Induction &Disambiguation , 2010, SemEval@ACL.