What’s missing in geographical parsing?

AbstractGeographical data can be obtained by converting place names from free-format text into geographical coordinates. The ability to geo-locate events in textual reports represents a valuable source of information in many real-world applications such as emergency responses, real-time social media geographical event analysis, understanding location instructions in auto-response systems and more. However, geoparsing is still widely regarded as a challenge because of domain language diversity, place name ambiguity, metonymic language and limited leveraging of context as we show in our analysis. Results to date, whilst promising, are on laboratory data and unlike in wider NLP are often not cross-compared. In this study, we evaluate and analyse the performance of a number of leading geoparsers on a number of corpora and highlight the challenges in detail. We also publish an automatically geotagged Wikipedia corpus to alleviate the dearth of (open source) corpora in this domain.

[1]  Scott Nesbit,et al.  Creating a Novel Geolocation Corpus from Historical Texts , 2016, LAW@ACL.

[2]  Judith Gelernter,et al.  Geocoding location expressions in Twitter messages: A preference learning method , 2014, J. Spatial Inf. Sci..

[3]  Judith Gelernter,et al.  An algorithm for local geoparsing of microtext , 2013, GeoInformatica.

[4]  Mónica Marrero,et al.  Named Entity Recognition: Fallacies, challenges and opportunities , 2013, Comput. Stand. Interfaces.

[5]  Jason Baldridge,et al.  Gazetteer-Independent Toponym Resolution Using Geographic Word Profiles , 2015, AAAI.

[6]  Jochen L. Leidner An evaluation dataset for the toponym resolution task , 2006, Comput. Environ. Urban Syst..

[7]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[8]  Wenyi Huang,et al.  GeoTxt: a web API to leverage place references in text , 2013, GIR '13.

[9]  Carole A. Goble,et al.  Software in reproducible research: advice and best practice collected from experiences at the collaborations workshop , 2014, TRUST '14.

[10]  David Yarowsky,et al.  One Sense Per Discourse , 1992, HLT.

[11]  Jeffrey T. Leek,et al.  Opinion: Reproducible research can still be wrong: Adopting a prevention approach , 2015, Proceedings of the National Academy of Sciences.

[12]  Gerhard Weikum,et al.  AIDA: An Online Tool for Accurate Disambiguation of Named Entities in Text and Tables , 2011, Proc. VLDB Endow..

[13]  Dan Roth,et al.  Design Challenges and Misconceptions in Named Entity Recognition , 2009, CoNLL.

[14]  Robert Gentleman,et al.  Statistical Analyses and Reproducible Research , 2007 .

[15]  Hanan Samet,et al.  Geotagging with local lexicons to build indexes for textually-specified spatial data , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[16]  Jie Yin,et al.  Location extraction from disaster-related microblogs , 2013, WWW.

[17]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[18]  Leif Isaksen,et al.  The Pleiades Gazetteer and the Pelagios Project , 2016 .

[19]  R. Peng Reproducible Research in Computational Science , 2011, Science.

[20]  Claire Grover,et al.  Evaluation of georeferencing , 2010, GIR.

[21]  Ralph Grishman,et al.  Is this NE tagger getting old? , 2008, LREC.

[22]  Alan M. MacEachren,et al.  Construction and first analysis of a corpus for the evaluation and training of microblog/twitter geoparsers , 2014, GIR '14.

[23]  Bu-Sung Lee,et al.  TwiNER: named entity recognition in targeted twitter stream , 2012, SIGIR '12.

[24]  Claire Grover,et al.  Use of the Edinburgh geoparser for georeferencing digitized historical collections , 2010, Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[25]  Derek Ruths,et al.  Geolocation Prediction in Twitter Using Social Networks: A Critical Analysis and Review of Current Practice , 2015, ICWSM.

[26]  Francesca Frontini,et al.  REDEN: Named Entity Linking in Digital Literary Editions Using Linked Data Sets , 2016, Complex Syst. Informatics Model. Q..

[27]  Joel Nothman,et al.  Named Entity Recognition in Wikipedia , 2009, PWNLP@IJCNLP.

[28]  Axel-Cyrille Ngonga Ngomo,et al.  Ensemble Learning for Named Entity Recognition , 2014, SEMWEB.