Collocations Dictionary of Modern Slovene KSSS 1.0

The database of the Collocations Dictionary of Modern Slovene 1.0 contains entries for 35,862 headwords (18,043 nouns, 5,148 verbs, 10,259 adjectives and 2,412 adverbs) and 7,310,983 collocations that were automatically extracted from the Gigafida 1.0 corpus. For the automatic extraction via the Sketch Engine API we used a specially adapted Sketch grammar for Slovene, and, based on manual evaluation, a set of parameters that determined: maximum number of collocates per grammatical relation, minimum frequency of a collocate, minimum frequency of a grammatical relation, minimum salience (logDice) score of a collocate, and minimum salience of a grammatical relation. The procedure of automatic extraction, which produced a list of collocates (lemmas) in a particular relation, was followed by a set of post-processing steps: - removal of collocations that were represented by repetitions of the same sentence - preparation of full collocations by the addition of the headword, and, if needed, the third element in the grammatical relation (such as preposition). The headwords/collocates were also put in the correct case, depending on the grammatical relation. - addition of IDs from the Slovenian morphological lexicon Sloleks (http://hdl.handle.net/11356/1230) to every element in the collocation.