Information Extraction from the Weather Reports in Serbian

In this paper, we describe a process of extracting information from meteorological texts in Serbian. The text corpus consists of almost 46000 sentences. Having in mind the specifics of Serbian and characteristics of meteorological sublanguage, we develop a classification schema for structuring extracted information and transducers for annotating pieces of information in the text corpus. We describe the transducer for extracting information about daily temperatures and give some evaluation parameters for all other transducers used in the information extraction process.