An Annotated Corpus for Development of Modern Cadastral Information Systems

Development of modern Cadastral Information Systems (CIS) requires deployment of tools for automatic estimation of real estates’ value which is influenced by a number of factors. After differentiation of the factors, apropriate information on certain locations needs to be acquired. Since most up-to-date information is transmited mainly as free-text documents via online media, information extraction technology plays a key role in converting such data into valuable and structured knowledge, which faciliates automatic real-estate value estimation. This article reports on creation of a corpus of Polish free-text documents, tagged with name mentions of CIS-relevant entities, which constitutes a core resource for development and evaluation of information extraction components used within a cadastre framework.