Journeys of the Past: A Hidden Markov Approach to Georeferencing Historical Itineraries

There are many historical itineraries that describe routes as a list of settlements and the travel distances along the way. They are an important source of information for various kinds of research in the humanities, providing insights into for example the development of human mobility and historical road networks. In this paper, we develop an approach for aligning these itineraries with a modern gazetteer (database of places). We combine textual information (historical toponyms) and spatial information (travel distances) into a Hidden Markov model. Naively calculating a maximum likelihood explanation is slow, but careful algorithm engineering achieves high performance suitable for user interaction. We demonstrate the practical potential of our approach by geo-referencing 48 itineraries (containing 691 stops) from two important historical guidebooks published in 1563 and 1597: our approach is fast and accurate. Additionally, we show how to use sensitivity analysis to power an efficient user interface for quality assurance.

[1]  Winfried Höhn Deep Learning for Place Name OCR in Early Maps , 2017 .

[2]  Jerod J. Weinman Toponym Recognition in Historical Maps by Gazetteer Alignment , 2013, 2013 12th International Conference on Document Analysis and Recognition.

[3]  Humphrey Southall,et al.  Placing Names: Enriching and Integrating Gazetteers , 2016 .

[4]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[5]  Yunghsiang Sam Han,et al.  Efficient priority-first search maximum-likelihood soft-decision decoding of linear block codes , 1993, IEEE Trans. Inf. Theory.

[6]  Judea Pearl,et al.  Fusion, Propagation, and Structuring in Belief Networks , 1986, Artif. Intell..

[7]  Javier Nogueras-Iso,et al.  Geocoding for texts with fine-grain toponyms: an experiment on a geoparsed hiking descriptions corpus , 2014, SIGSPATIAL/GIS.

[8]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[9]  Peter N. Yianilos,et al.  Learning String-Edit Distance , 1996, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  D. Hosmer,et al.  Logistic Regression, Conditional , 2005 .

[11]  Humphrey Southall,et al.  On historical gazetteers , 2011, Int. J. Humanit. Arts Comput..

[12]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[13]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[14]  Javier Nogueras-Iso,et al.  Reconstruction of itineraries from annotated text with an informed spanning tree algorithm , 2016, Int. J. Geogr. Inf. Sci..

[15]  Ian N. Gregory,et al.  Alts, Abbreviations, and AKAs: Historical Onomastic Variation and Automated Named Entity Recognition , 2017 .

[16]  Yao-Yi Chiang Querying historical maps as a unified, structured, and linked spatiotemporal source: vision paper , 2015, SIGSPATIAL/GIS.

[17]  John Cocke,et al.  Optimal decoding of linear codes for minimizing symbol error rate (Corresp.) , 1974, IEEE Trans. Inf. Theory.

[18]  Andreas Henrich,et al.  Geocoding place names from historic route descriptions , 2015, GIR.

[19]  Hanan Samet,et al.  Itinerary retrieval: travelers, like traveling salesmen, prefer efficient routes , 2014, GIR '14.

[20]  John Krumm,et al.  Hidden Markov map matching through noise and sparseness , 2009, GIS.

[21]  Max M. Louwerse,et al.  A Comparison of String Similarity Measures for Toponym Matching , 2013, COMP '13.

[22]  Andreas Henrich,et al.  A depth-first branch-and-bound algorithm for geocoding historic itinerary tables , 2016, GIR.

[23]  Stephan Winter,et al.  Extracting Spatial Information From Place Descriptions , 2013, COMP '13.

[24]  Norbert Fuhr,et al.  Retrieval in text collections with historic spelling using linguistic and spelling variants , 2007, JCDL '07.

[25]  Leif Isaksen,et al.  Linking early geospatial documents, one place at a time: annotation of geographic documents with Recogito , 2015 .

[26]  I. Abou-Faycal,et al.  A fast maximum-likelihood decoder for convolutional codes , 2002, Proceedings IEEE 56th Vehicular Technology Conference.

[27]  Bruno Martins,et al.  Automated Geocoding of Textual Documents: A Survey of Current Approaches , 2017, Trans. GIS.

[28]  Deniz Kilinç,et al.  An accurate toponym-matching measure based on approximate string matching , 2016, J. Inf. Sci..