An algorithm for local geoparsing of microtext

The location of the author of a social media message is not invariably the same as the location that the author writes about in the message. In applications that mine these messages for information such as tracking news, political events or responding to disasters, it is the geographic content of the message rather than the location of the author that is important. To this end, we present a method to geo-parse the short, informal messages known as microtext. Our preliminary investigation has shown that many microtext messages contain place references that are abbreviated, misspelled, or highly localized. These references are missed by standard geo-parsers. Our geo-parser is built to find such references. It uses Natural Language Processing methods to identify references to streets and addresses, buildings and urban spaces, and toponyms, and place acronyms and abbreviations. It combines heuristics, open-source Named Entity Recognition software, and machine learning techniques. Our primary data consisted of Twitter messages sent immediately following the February 2011 earthquake in Christchurch, New Zealand. The algorithm identified location in the data sample, Twitter messages, giving an F statistic of 0.85 for streets, 0.86 for buildings, 0.96 for toponyms, and 0.88 for place abbreviations, with a combined average F of 0.90 for identifying places. The same data run through a geo-parsing standard, Yahoo! Placemaker, yielded an F statistic of zero for streets and buildings (because Placemaker is designed to find neither streets nor buildings), and an F of 0.67 for toponyms.

[1]  John Domingue,et al.  A differential notion of place for local search , 2008, LocWeb.

[2]  Youngja Park,et al.  Hybrid Text Mining for Finding Abbreviations and their Definitions , 2001, EMNLP.

[3]  Yang Liu,et al.  Toward text message normalization: Modeling abbreviation generation , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Sharon Myrtle Paradesi,et al.  Geotagging Tweets Using Their Content , 2011, FLAIRS.

[5]  W. Bruce Croft,et al.  A language modeling approach to information retrieval , 1998, SIGIR '98.

[6]  Peter D. Turney,et al.  A Supervised Learning Approach to Acronym Identification , 2005, Canadian AI.

[7]  Naoaki Okazaki,et al.  A Term Recognition Approach to Acronym Recognition , 2006, ACL.

[8]  Kazem Taghva,et al.  Acronym Expansion Via Hidden Markov Models , 2011, 2011 21st International Conference on Systems Engineering.

[9]  Naoaki Okazaki,et al.  A Discriminative Alignment Model for Abbreviation Recognition , 2008, COLING.

[10]  Mirna Adriani,et al.  Identifying location in indonesian documents for geographic information retrieval , 2007, GIR '07.

[11]  Yutaka Matsuo,et al.  Semantic Twitter: Analyzing Tweets for Real-Time Event Notification , 2008, BlogTalk.

[12]  Yalou Huang,et al.  Expansion Finding for Given Acronyms Using Conditional Random Fields , 2011, WAIM.

[13]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[14]  Michael S. Bernstein,et al.  Processing and visualizing the data in tweets , 2011, SGMD.

[15]  Jason J. Jung Towards Named Entity Recognition Method for Microtexts in Online Social Networks: A Case Study of Twitter , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[16]  Sheila Kinsella,et al.  "I'm eating a sandwich in Glasgow": modeling locations with tweets , 2011, SMUC '11.

[17]  Kazufumi Watanabe,et al.  Jasmine: a real-time local-event detection system based on geolocation information propagated to microblogs , 2011, CIKM '11.

[18]  Hanan Samet,et al.  Multifaceted toponym recognition for streaming news , 2011, SIGIR.

[19]  Waleed Ammar,et al.  ICE-TEA: In-Context Expansion and Translation of English Abbreviations , 2011, CICLing.

[20]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[21]  Siddharth Patwardhan,et al.  Using Syntactic and Semantic Structural Kernels for Classifying Definition Questions in Jeopardy! , 2011, EMNLP.

[22]  Katsumi Takahashi,et al.  Geo-word centric association rule mining , 2005, MDM '05.

[23]  Dana Dannélls,et al.  Automatic Acronym Recognition , 2006, EACL.

[24]  Véronique Hoste,et al.  Towards a Learning Approach for Abbreviation Detection and Resolution , 2010, LREC.

[25]  Sven Hartrumpf,et al.  On metonymy recognition for geographic information retrieval , 2008, Int. J. Geogr. Inf. Sci..

[26]  Fabio Ciravegna,et al.  Toponym Resolution in Social Media , 2010, SEMWEB.

[27]  Jason Baldridge,et al.  Simple supervised document geolocation with geodesic grids , 2011, ACL.

[28]  Ming Zhou,et al.  Recognizing Named Entities in Tweets , 2011, ACL.

[29]  Ed H. Chi,et al.  Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles , 2011, CHI.

[30]  Ying Liu,et al.  Using Second-order Vectors in a Knowledge-based Method for Acronym Disambiguation , 2011, CoNLL.

[31]  Mathieu Roche,et al.  AcroDef : A Quality Measure for Discriminating Expansions of Ambiguous Acronyms , 2007, CONTEXT.

[32]  Leysia Palen,et al.  Microblogging during two natural hazards events: what twitter may contribute to situational awareness , 2010, CHI.

[33]  Leysia Palen,et al.  Chatter on the red: what hazards threat reveals about the social life of microblogged information , 2010, CSCW '10.

[34]  Judith Gelernter,et al.  Geo‐parsing Messages from Microtext , 2011, Trans. GIS.

[35]  Hanan Samet,et al.  Geotagging with local lexicons to build indexes for textually-specified spatial data , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[36]  Sven Hartrumpf,et al.  On metonymy recognition for geographic IR , 2006, GIR.

[37]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[38]  Emily Hill,et al.  AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools , 2008, MSR '08.

[39]  Yong Liu,et al.  Going Beyond Citizen Data Collection with Mapster: A Mobile+Cloud Real-Time Citizen Science Experiment , 2011, 2011 IEEE Seventh International Conference on e-Science Workshops.