PROCESSING OF CORPORA OF SERBIAN USING ELECTRONIC DICTIONARIES

Among language resources we distinguish, on the one hand, corpora and, on the other hand, dictionaries and grammars. The construction of dictionaries and grammars is slow, because the largest part of the process of their construction is done manually (Laporte 2009). Their interaction with corpora, however, enables more sophisticated processing that cannot be easily achieved without their support. The existing Serbian language resources can also be analysed in this light (Vitas et al. 2003b). The paper will provide a description of the manually constructed resources for Serbian, namely, the system of morphological electronic dictionaries and semantic networks followed by a review of some of the Serbian language corpora. The aim is to demonstrate how corpora can be successfully exploited using the high-recall tagging that differs significantly from the mainstream approach that is based on the one-to-one tagging prior to any processing.