The paper discusses the differences between collocations extracted from a number of Russian dictionaries paying attention to their frequency characteristics based on corpora. The aim of the study was, first, to analyze how collocations and set expressions are described in Russian explanatory and specialized dictionaries and to what extent their data coincide with each other, and, secondly, to investigate how collocations presented in dictionaries are reflected in text corpora. This will make it possible to examine the interrelation between the “manually” collected data and modern corpora (the Russian National Corpus and ruTenTen). We tested the following hypothesis, i.e. high collocation frequencies correspond to the fact that the item is represented in several dictionaries. In our paper we considered 180 collocations built according to the “adjective / participle + noun” model. The results show the heterogeneity of the dictionary data while the choice of lexical items does not coincide with its frequency characteristics: the examples are low-frequency and about 34% are absent in the disambiguated subcorpus. Explanatory dictionaries and collocation dictionaries show the smallest overlap.
[1]
B. T. S. Atkins,et al.
The Oxford Guide to Practical Lexicography
,
2008
.
[2]
Thierry Fontenelle.
Collocation acquisition from a corpus or from a dictionary: a comparison
,
1992
.
[3]
Eric Wehrli,et al.
Collocations in a Rule-Based MT System: A Case Study Evaluation of their Translation Adequacy
,
2009,
EAMT.
[4]
Maria Khokhlova,et al.
In Search of Lost Collocations: Combining Measures to Reach the Top Range
,
2017,
IMS 2017.
[5]
Мария Владимировна Хохлова,et al.
Revision and extension of the OIM database – The Italianisms in German
,
2018
.
[6]
Maria Khokhlova.
Collocations in Russian Lexicography and Russian Collocations Database
,
2020,
LREC.
[7]
Adam Kilgarriff,et al.
The TenTen Corpus Family
,
2013
.
[8]
Paul Rayson,et al.
Comparing Corpora using Frequency Profiling
,
2000,
Proceedings of the workshop on Comparing corpora -.
[9]
Ralph Grishman,et al.
Towards Best Practice for Multiword Expressions in Computational Lexicons
,
2002,
LREC.