Heuristics-Based Replenishment of Collocation Databases

Collections are defined as syntactically linked and semantically plausible combinations of content words. Since collections constitute a bulk of common texts and depend on nthe language, creation of such databases is prohibitively expensive. We present heuristics for automatic generation of new Spanish collocations based on those already present in a CBD, with the help of WordNet-like thesaurus: If a word A is semantically "similar" to a word B and a collocation B + C is known, then A + C presumable is a collocation of the same type given certain conditions are met.