Oblikoslovni vzorci v leksikonu Sloleks: izhodiščni nabor za samostalnike

The paper presents the first step to expanding the Sloleks lexicon of Slovene with morphological patterns, starting with nouns. In the first phase, the patterns were extracted automatically from the lexicon based on a selection of differentiating characteristics (morphosyntactic tags and variant word parts). This was followed by a manual categorization during which we (a) separated patterns that are either systemic or based on actual language use from examples extracted because of noise attributable to either the extraction method or inconsistencies in Sloleks; (b) arranged patterns into groups based on their content and relatedness; (c) analyzed and more clearly defined form variability, with both standard and non-standard word forms; (d) propose future steps for the further development of the extraction method and lexicon upgrades. The result is a set of formalized morphological patterns for (common and proper) nouns containing 10 groups (64 patterns) for masculine nouns, 9 groups (29 patterns) for feminine nouns and 8 groups (20 patterns) for neuter nouns. The preparation of the set of formalized patterns also resulted in numerous suggestions on how to upgrade the lexicon, while a machine-focused view of morphological flection offers opportunities to improve the current grammatical description of Slovene. As part of our future work, we intend to expand the set of patterns with other parts of speech and corpus-based material. The final categorization of patterns will be included in the Sloleks lexicon, and the patterns will also be published on the CLARIN.SI repository in a machine-readable format.