Geo‐parsing Messages from Microtext

Widespread use of social media during crises has become commonplace, as shown by the volume of messages during the Haiti earthquake of 2010 and Japan tsunami of 2011. Location mentions are particularly important in disaster messages as they can show emergency responders where problems have occurred. This article explores the sorts of locations that occur in disaster-related social messages, how well off-the-shelf software identifies those locations, and what is needed to improve automated location identification, called geo-parsing. To do this, we have sampled Twitter messages from the February 2011 earthquake in Christchurch, Canterbury, New Zealand. We annotated locations in messages manually to make a gold standard by which to measure locations identified by a Named Entity Recognition software. The Stanford NER software found some locations that were proper nouns, but did not identify locations that were not capitalized, local streets and buildings, or non-standard place abbreviations and mis-spellings that are plentiful in microtext. We review how these problems might be solved in software research, and model a readable crisis map that shows crisis location clusters via enlarged place labels.

[1]  Rocio Guillén GeoParsing Web Queries , 2007, CLEF.

[2]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[3]  Robert E. Frederking,et al.  SYNERGY: A Named Entity Recognition System for Resource-scarce Languages such as Swahili using Online Machine Translation , 2010 .

[4]  T. Sasipraba,et al.  Disaster management system based on GIS web services , 2010, Recent Advances in Space Technology Services and Climate Change 2010 (RSTS & CC-2010).

[5]  Hanan Samet,et al.  Geotagging with local lexicons to build indexes for textually-specified spatial data , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[6]  Shashi Shekhar,et al.  Identifying patterns in spatial information: A survey of methods , 2011, WIREs Data Mining Knowl. Discov..

[7]  K. Gwet,et al.  INTRARATER RELIABILITY , 2008 .

[8]  Elad Yom-Tov,et al.  The Effect of Social and Physical Detachment on Information Need , 2013, ACM Trans. Inf. Syst..

[9]  Sophia B. Liu,et al.  The New Cartographers: Crisis Map Mashups and the Emergence of Neogeographic Practice , 2010 .

[10]  Waleed Ammar,et al.  ICE-TEA: In-Context Expansion and Translation of English Abbreviations , 2011, CICLing.

[11]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[12]  Xing Xie,et al.  An efficient location extraction algorithm by leveraging web contextual information , 2010, GIS '10.

[13]  José Luis Borbinha,et al.  A metadata geoparsing system for place name recognition and resolution in metadata records , 2011, JCDL '11.

[14]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[15]  Leysia Palen,et al.  Microblogging during two natural hazards events: what twitter may contribute to situational awareness , 2010, CHI.

[16]  Mark A. Neerincx,et al.  Distributed collaborative situation-map making for disaster response , 2011, Interact. Comput..

[17]  Fredric C. Gey,et al.  An Evaluation Resource for Geographic Information Retrieval , 2008, LREC.

[18]  Barbara Poblete,et al.  Twitter under crisis: can we trust what we RT? , 2010, SOMA '10.

[19]  Ron Sivan,et al.  Web-a-where: geotagging web content , 2004, SIGIR '04.

[20]  Barbara Poblete,et al.  Information credibility on twitter , 2011, WWW.

[21]  Leysia Palen,et al.  "Voluntweeters": self-organizing by digital volunteers in times of crisis , 2011, CHI.

[22]  Huiji Gao,et al.  Harnessing the Crowdsourcing Power of Social Media for Disaster Relief , 2011, IEEE Intelligent Systems.

[23]  Jeannie A. Stamberger,et al.  Tweak the tweet: Leveraging microblogging proliferation with a prescriptive syntax to support citizen reporting , 2010, ISCRAM.

[24]  Naoaki Okazaki,et al.  A Discriminative Alignment Model for Abbreviation Recognition , 2008, COLING.

[25]  Krzysztof Janowicz,et al.  An agenda for the next generation gazetteer: geographic information contribution and retrieval , 2009, GIS.

[26]  Robert Munro,et al.  Subword and Spatiotemporal Models for Identifying Actionable Information in Haitian Kreyol , 2011, CoNLL.

[27]  Sharon Myrtle Paradesi,et al.  Geotagging Tweets Using Their Content , 2011, FLAIRS.