Towards semi-automatic annotation of toponyms on old maps

Present-day map digitization methods produce data that is semantically opaque; that is to a machine, a digitized map is merely a collection of bits and bytes. The area it depicts, the places it mentions, any text contained within legends or written on its margins remain unknown - unless a human appraises the image and manually adds this information to its metadata. This problem is especially severe in the case of old maps: these are typically handwritten, may contain text in varying orientations and sizes, and can be in a bad condition due to varying levels of deterioration or damage. As a result, searching for the contents of these documents remains challenging, which makes them hard to discover for users, unusable for machine processing and analysis, and thus effectively lost to many forms of public, scientific or commercial utilization. Fully automatic detection and transcription of place names and legends is, likely, not achievable with today's technology. We argue, however, that semi-automated methods can eliminate much of the tedious effort required to annotate map scans entirely by hand. In this paper, we showcase early work on semi-automatic place name annotation. In our experiment, we utilize open source tools to identify potential locations on the map representing toponyms. We present how, in next steps, we aim to extend our experiment by exploiting the spatial layout of identified candidates to deduce possible place names based on existing toponym lists. Ultimately, or goal is to combine this work with a toolset for manual image annotation into a convenient online environment. This will allow curators, researchers, and potentially also the general public “tag” and annotate toponyms on digitized maps rapidly.