Spatial Language Representation with Multi-Level Geocoding

We present a multi-level geocoding model (MLG) that learns to associate texts to geographic locations. The Earth's surface is represented using space-filling curves that decompose the sphere into a hierarchy of similarly sized, non-overlapping cells. MLG balances generalization and accuracy by combining losses across multiple levels and predicting cells at each level simultaneously. Without using any dataset-specific tuning, we show that MLG obtains state-of-the-art results for toponym resolution on three English datasets. Furthermore, it obtains large gains without any knowledge base metadata, demonstrating that it can effectively learn the connection between text spans and coordinates - and thus can be extended to toponymns not present in knowledge bases.

[1]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[2]  Bruno Martins,et al.  Geocoding textual documents through the usage of hierarchical classifiers , 2015, GIR.

[3]  Nigel Collier,et al.  Which Melbourne? Augmenting Geocoding with Maps , 2018, ACL.

[4]  Steven Schockaert,et al.  Georeferencing Wikipedia Documents Using Data from Social Media Sources , 2014, ACM Trans. Inf. Syst..

[5]  Ehsan Kamalloo,et al.  A Coherent Unsupervised Model for Toponym Resolution , 2018, WWW.

[6]  Jochen L. Leidner Toponym resolution in text: annotation, evaluation and applications of spatial grounding , 2007, SIGF.

[7]  Timothy Baldwin,et al.  Exploiting Text and Network Context for Geolocation of Social Media Users , 2015, NAACL.

[8]  Bruno Martins,et al.  Using machine learning methods for disambiguating place references in textual documents , 2014, GeoJournal.

[9]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[10]  Jacinto Estima,et al.  Using Recurrent Neural Networks for Toponym Resolution in Text , 2019, EPIA.

[11]  Jason Baldridge,et al.  Simple supervised document geolocation with geodesic grids , 2011, ACL.

[12]  Pavel Serdyukov,et al.  Placing flickr photos on a map , 2009, SIGIR.

[13]  Derek Ruths,et al.  Geolocation Prediction in Twitter Using Social Networks: A Critical Analysis and Review of Current Practice , 2015, ICWSM.

[14]  Bruno Martins,et al.  Geocodificação de Documentos Textuais com Classificadores Hierárquicos Baseados em Modelos de Linguagem , 2012, Linguamática.

[15]  Timothy Baldwin,et al.  Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks , 2017, EMNLP.

[16]  Jason Baldridge,et al.  Gazetteer-Independent Toponym Resolution Using Geographic Word Profiles , 2015, AAAI.

[17]  Wenyi Huang,et al.  GeoTxt: a web API to leverage place references in text , 2013, GIR '13.

[18]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[19]  Jason Baldridge,et al.  Supervised Text-based Geolocation Using Language Models on an Adaptive Grid , 2012, EMNLP.

[20]  Mans Hulden,et al.  Kernel Density Estimation for Text-Based Geolocation , 2015, AAAI.

[21]  Gregory R. Crane,et al.  Disambiguating Geographic Names in a Historical Digital Library , 2001, ECDL.

[22]  Timothy Baldwin,et al.  pigeo: A Python Geotagging Tool , 2016, ACL.

[23]  Jason Baldridge,et al.  Text-Driven Toponym Resolution using Indirect Supervision , 2013, ACL.

[24]  Claire Grover,et al.  Evaluation of georeferencing , 2010, GIR.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Bruno Martins,et al.  Automated Geocoding of Textual Documents: A Survey of Current Approaches , 2017, Trans. GIS.

[27]  Aron Culotta,et al.  Inferring the origin locations of tweets with quantitative confidence , 2013, CSCW.

[28]  Dirk Hovy,et al.  Geolocation with Attention-Based Multitask Learning Models , 2019, EMNLP.

[29]  Nigel Collier,et al.  What’s missing in geographical parsing? , 2017, Language Resources and Evaluation.

[30]  Scott Nesbit,et al.  Creating a Novel Geolocation Corpus from Historical Texts , 2016, LAW@ACL.

[31]  Judith Gelernter,et al.  Geocoding location expressions in Twitter messages: A preference learning method , 2014, J. Spatial Inf. Sci..

[32]  Dirk Hovy,et al.  Identifying Linguistic Areas for Geolocation , 2019, EMNLP.

[33]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[34]  Jason Baldridge,et al.  Hierarchical Discriminative Classification for Text-Based Geolocation , 2014, EMNLP.

[35]  Peter Z. Kunszt,et al.  Indexing the Sphere with the Hierarchical Triangular Mesh , 2007, ArXiv.

[36]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..

[37]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[38]  Claire Grover,et al.  Use of the Edinburgh geoparser for georeferencing digitized historical collections , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[39]  Hanan Samet,et al.  Geotagging with local lexicons to build indexes for textually-specified spatial data , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[40]  Dirk Ahlers,et al.  Expanding the utility of geospatial knowledge bases by linking concepts to WikiText and to polygonal boundaries , 2015, GIR.

[41]  Lars Backstrom,et al.  Find me if you can: improving geographical prediction with social and spatial proximity , 2010, WWW '10.