Ranking Georeferences for Efficient Crowdsourcing of Toponym Annotations in a Historical Corpus of Alpine Texts

This paper presents a simple method to rank georeference candidates to optimally support the workflow of a citizen science web application for toponym annotation in historical texts. We implement the general idea of efficient crowdsourcing based on human and artificial intelligence working hand in hand. For named entity recognition, we apply recent neural pretraining-based NER tagger methods. For named entity linking to geographical knowledge bases, we report on georeference ranking experiments testing the hypothesis that textual proximity indicates geographic proximity. Simulation results with online reranking that immediately integrates user verification show further improvements.

[1]  Simon Clematide,et al.  Geotagging a Diachronic Corpus of Alpine Texts: Comparing Distinct Approaches to Toponym Recognition , 2019, Proceedings of the Workshop on Language Technology for Digital Historical Archives - with a Special Focus on Central-, (South-)Eastern Europe, Middle East and North Africa.

[2]  Paolo Rosso,et al.  Map-based vs. knowledge-based toponym disambiguation , 2008, GIR '08.

[3]  Henry Rosales-Méndez Towards Better Entity Linking Evaluation , 2019, WWW.

[4]  Martin Volk,et al.  The Text+Berg corpus: an alpine french-german parallel resource , 2011 .

[5]  Simon Clematide,et al.  Crowdsourcing an OCR Gold Standard for a German and French Heritage Corpus , 2016, LREC.

[6]  Nigel Collier,et al.  A pragmatic guide to geoparsing evaluation , 2018, Language Resources and Evaluation.

[7]  Davide Buscaldi,et al.  Approaches to disambiguating toponyms , 2011, SIGSPACIAL.

[8]  Jochen L. Leidner Toponym resolution in text: annotation, evaluation and applications of spatial grounding , 2007, SIGF.

[9]  Davy Weissenbacher,et al.  SemEval-2019 Task 12: Toponym Resolution in Scientific Papers , 2019, *SEMEVAL.

[10]  Roland Vollgraf,et al.  Contextual String Embeddings for Sequence Labeling , 2018, COLING.

[11]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[12]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .