Geolocation System with Applications to Public Health

Public health applications using social media often require accurate, broad-coverage location information. However, the standard information provided by social media APIs, such as Twitter, cover a limited number of messages. This paper presents Carmen, a geolocation system that can determine structured location information for messages provided by the Twitter API. Our system utilizes geocoding tools and a combination of automatic and manual alias resolution methods to infer location structures from GPS positions and user-provided profile data. We show that our system is accurate and covers many locations, and we demonstrate its utility for improving influenza surveillance.

[1]  Brendan T. O'Connor,et al.  From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series , 2010, ICWSM.

[2]  Brendan T. O'Connor,et al.  A Latent Variable Model for Geographic Lexical Variation , 2010, EMNLP.

[3]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[4]  Lars Backstrom,et al.  Find me if you can: improving geographical prediction with social and spatial proximity , 2010, WWW '10.

[5]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[6]  Kyumin Lee,et al.  You are where you tweet: a content-based approach to geo-locating twitter users , 2010, CIKM.

[7]  Gisele L. Pappa,et al.  Inferring the Location of Twitter Messages Based on User Relationships , 2011, Trans. GIS.

[8]  Alberto Maria Segre,et al.  The Use of Twitter to Track Levels of Disease Activity and Public Concern in the U.S. during the Influenza A H1N1 Pandemic , 2011, PloS one.

[9]  Christophe G. Giraud-Carrier,et al.  Identifying Health-Related Topics on Twitter - An Exploration of Tobacco-Related Tweets as a Test Topic , 2011, SBP.

[10]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[11]  Ed H. Chi,et al.  Tweets from Justin Bieber's heart: the dynamics of the location field in user profiles , 2011, CHI.

[12]  Jason Baldridge,et al.  Simple supervised document geolocation with geodesic grids , 2011, ACL.

[13]  N. Heaivilin,et al.  Public Health Surveillance of Dental Pain via Twitter , 2011, Journal of dental research.

[14]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[15]  P. Eke Using Social Media for Research and Public Health Surveillance , 2011, Journal of dental research.

[16]  Nigel Collier,et al.  Uncovering text mining: A survey of current work on web-based epidemic intelligence , 2012, Global public health.

[17]  Yi-Shin Chen,et al.  TweoLocator: a non-intrusive geographical locator system for Twitter , 2012, LBSN '12.

[18]  Michael D. Barnes,et al.  "Right Time, Right Place" Health Communication on Twitter: Value and Accuracy of Location Information , 2012, Journal of medical Internet research.

[19]  Jason Baldridge,et al.  Supervised Text-based Geolocation Using Language Models on an Adaptive Grid , 2012, EMNLP.

[20]  Henry A. Kautz,et al.  Finding your friends and following them to where you are , 2012, WSDM '12.

[21]  Henry A. Kautz,et al.  Modeling Spread of Disease from Social Interactions , 2012, ICWSM.

[22]  Henry A. Kautz,et al.  Predicting Disease Transmission from Geo-Tagged Micro-Blog Data , 2012, AAAI.

[23]  Mark Dredze,et al.  Drug Extraction from the Web: Summarizing Drug Experiences with Multi-Dimensional Topic Models , 2013, NAACL.

[24]  Mourad Oussalah,et al.  A software architecture for Twitter collection, search and geolocation services , 2013, Knowl. Based Syst..

[25]  David Yarowsky,et al.  Broadly Improving User Classification via Communication-Based Name and Location Clustering on Twitter , 2013, NAACL.

[26]  Scott A. Hale,et al.  Where in the World Are You? Geolocation and Language Identification in Twitter* , 2013, ArXiv.

[27]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.