Quantitative Properties of Russian Adjective-Noun Collocations across Dictionaries and Corpora

The paper discusses the differences between collocations extracted from a number of Russian dictionaries paying attention to their frequency characteristics based on corpora. The aim of the study was, first, to analyze how collocations and set expressions are described in Russian explanatory and specialized dictionaries and to what extent their data coincide with each other, and, secondly, to investigate how collocations presented in dictionaries are reflected in text corpora. This will make it possible to examine the interrelation between the “manually” collected data and modern corpora (the Russian National Corpus and ruTenTen). We tested the following hypothesis, i.e. high collocation frequencies correspond to the fact that the item is represented in several dictionaries. In our paper we considered 180 collocations built according to the “adjective / participle + noun” model. The results show the heterogeneity of the dictionary data while the choice of lexical items does not coincide with its frequency characteristics: the examples are low-frequency and about 34% are absent in the disambiguated subcorpus. Explanatory dictionaries and collocation dictionaries show the smallest overlap.