Text-Driven Toponym Resolution using Indirect Supervision

Toponym resolvers identify the specific locations referred to by ambiguous placenames in text. Most resolvers are based on heuristics using spatial relationships between multiple toponyms in a document, or metadata such as population. This paper shows that text-driven disambiguation for toponyms is far more effective. We exploit document-level geotags to indirectly generate training instances for text classifiers for toponym resolution, and show that textual cues can be straightforwardly integrated with other commonly used ones. Results are given for both 19th century texts pertaining to the American Civil War and 20th century newswire articles.

[1]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[2]  Bruno Martins,et al.  Learning to resolve geographical and temporal references in text , 2011, GIS.

[3]  Jason Baldridge,et al.  Supervised Text-based Geolocation Using Language Models on an Adaptive Grid , 2012, EMNLP.

[4]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[5]  Hanan Samet,et al.  TwitterStand: news in tweets , 2009, GIS.

[6]  Jason Baldridge,et al.  Simple supervised document geolocation with geodesic grids , 2011, ACL.

[7]  Stefan M. Rüger,et al.  Using co‐occurrence models for placename disambiguation , 2008, Int. J. Geogr. Inf. Sci..

[8]  Inderjeet Mani,et al.  SpatialML: annotation scheme, resources, and evaluation , 2010, Lang. Resour. Evaluation.

[9]  André Skupin,et al.  An alternative map of the United States based on an n-dimensional model of geographic space , 2011, J. Vis. Lang. Comput..

[10]  Claire Grover,et al.  Use of the Edinburgh geoparser for georeferencing digitized historical collections , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[11]  Ross Purves,et al.  Exploring place through user-generated content: Using Flickr tags to describe city cores , 2010, J. Spatial Inf. Sci..

[12]  Changhu Wang,et al.  Equip tourists with knowledge mined from travelogues , 2010, WWW '10.

[13]  Kristina Lerman,et al.  A probabilistic approach to mining geospatial knowledge from social annotations , 2012, SIGSPACIAL.

[14]  Erik Rauch,et al.  A confidence-based framework for disambiguating geographic terms , 2003, HLT-NAACL 2003.

[15]  Hanan Samet,et al.  NewsStand: a new view on news , 2008, GIS '08.

[16]  Cheng Niu,et al.  InfoXtract location normalization: a hybrid approach to geographic references in information extraction , 2003, HLT-NAACL 2003.

[17]  Paul D. Clough Extracting metadata for spatially-aware information retrieval on the internet , 2005, GIR '05.

[18]  Dan Wu,et al.  On assigning place names to geography related web pages , 2005, Proceedings of the 5th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '05).

[19]  Gideon S. Mann,et al.  Bootstrapping toponym classifiers , 2003, HLT-NAACL 2003.

[20]  Doug Downey,et al.  Explanatory semantic relatedness and explicit spatialization for exploratory search , 2012, SIGIR '12.

[21]  Max M. Louwerse,et al.  Representing Spatial Structure Through Maps and Language: Lord of the Rings Encodes the Spatial Structure of Middle Earth , 2012, Cogn. Sci..

[22]  Tanji Hu,et al.  Summarizing tourist destinations by mining user-generated travelogues and photos , 2011, Comput. Vis. Image Underst..

[23]  Gregory R. Crane,et al.  Disambiguating Geographic Names in a Historical Digital Library , 2001, ECDL.

[24]  Karel Vaculík,et al.  Perseus Digital Library , 2008 .

[25]  Gerhard Weikum,et al.  Robust Disambiguation of Named Entities in Text , 2011, EMNLP.

[26]  Peiquan Jin,et al.  Extracting Focused Locations for Web Pages , 2011, WAIM Workshops.

[27]  Benjamin Adams,et al.  Inferring Thematic Places from Spatially Referenced Natural Language Descriptions , 2013 .

[28]  José Luis Vicedo González,et al.  Georeferencing: The geographic associations of information , 2007, J. Assoc. Inf. Sci. Technol..

[29]  Xing Xie,et al.  An efficient location extraction algorithm by leveraging web contextual information , 2010, GIS '10.

[30]  Paolo Rosso,et al.  A conceptual density‐based approach for the disambiguation of toponyms , 2008, Int. J. Geogr. Inf. Sci..

[31]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[32]  Raphael Volz,et al.  Towards Ontology-based Disambiguation of Geographical Identifiers , 2007, I3.

[33]  Oscar Pedreira,et al.  A Toponym Resolution Service Following the OGC WPS Standard , 2008, W2GIS.

[34]  Hideo Joho,et al.  Deliverable type: Contributing WP: , 2022 .

[35]  Jochen L. Leidner Toponym resolution in text: annotation, evaluation and applications of spatial grounding , 2007, SIGF.

[36]  Luis Gravano,et al.  Computing Geographical Scopes of Web Resources , 2000, VLDB.

[37]  Judith Gelernter,et al.  Geo‐parsing Messages from Microtext , 2011, Trans. GIS.

[38]  Yi Li Probabilistic Toponym Resolution and Geographic Indexing and Querying , 2012 .

[39]  Sanda M. Harabagiu,et al.  Toponym Disambiguation Using Events , 2010, FLAIRS Conference.

[40]  Walter Scheidel,et al.  Orbis: The Stanford Geospatial Network Model of the Roman World , 2015 .

[41]  Eric P. Xing,et al.  Sparse Additive Generative Models of Text , 2011, ICML.