论文信息 - An Annotated Corpus for Development of Modern Cadastral Information Systems

An Annotated Corpus for Development of Modern Cadastral Information Systems

Development of modern Cadastral Information Systems (CIS) requires deployment of tools for automatic estimation of real estates’ value which is influenced by a number of factors. After differentiation of the factors, apropriate information on certain locations needs to be acquired. Since most up-to-date information is transmited mainly as free-text documents via online media, information extraction technology plays a key role in converting such data into valuable and structured knowledge, which faciliates automatic real-estate value estimation. This article reports on creation of a corpus of Polish free-text documents, tagged with name mentions of CIS-relevant entities, which constitutes a core resource for development and evaluation of information extraction components used within a cadastre framework.

Agata Filipowska | Jakub Piskorski | Krzysztof Wecel | Karol Wieloch

[1] Laurel D. Riek,et al. Callisto: A Configurable Annotation Workbench , 2004, LREC.

[2] Bruno Pouliquen,et al. Geographical information recognition and visualization in texts written in various languages , 2004, SAC '04.

[3] Mark A. Przybocki,et al. The Automatic Content Extraction (ACE) Program – Tasks, Data, and Evaluation , 2004, LREC.

[4] Witold Abramowicz,et al. eVEREst SUPPORTINGESTIMATION OFREAL ESTATE VALUE WITOLDABRAMOWICZ ANDRZEJBASSARA , 2004 .

[5] Kevin S. McCurley,et al. Geospatial mapping and navigation of the web , 2001, WWW '01.

[6] Jonathan G. Fiscus,et al. A Pratical Introduction to ATLAS , 2002, LREC.

[7] Nancy A. Chinchor,et al. Overview of MUC-7 , 1998, MUC.

[8] Kalervo Järvelin,et al. Proceedings of Sheffield SIGIR, 2004, July 25th-29th : the Twenty-Seventh Annual International ACM SIGIR Conference on Research and Development in information Retrieval , 2004 .

[9] Constantin Orasan,et al. Improving anaphora resolution by identifying animate entities in texts , 2002 .