Predicting Disease Transmission from Geo-Tagged Micro-Blog Data

Researchers have begun to mine social network data in order to predict a variety of social, economic, and health related phenomena. While previous work has focused on predicting aggregate properties, such as the prevalence of seasonal influenza in a given country, we consider the task of fine-grained prediction of the health of specific people from noisy and incomplete data. We construct a probabilistic model that can predict if and when an individual will fall ill with high precision and good recall on the basis of his social ties and co-locations with other people, as revealed by their Twitter posts. Our model is highly scalable and can be used to predict general dynamic properties of individuals in large realworld social networks. These results provide a foundation for research on fundamental questions of public health, including the identification of non-cooperative disease carriers ("Typhoid Marys"), adaptive vaccination policies, and our understanding of the emergence of global epidemics from day-today interpersonal interactions.

[1]  Andrew McCallum,et al.  An Introduction to Conditional Random Fields for Relational Learning , 2007 .

[2]  Henry A. Kautz,et al.  Finding your friends and following them to where you are , 2012, WSDM '12.

[3]  Nigel Collier,et al.  OMG U got flu? Analysis of shared health messages for bio-surveillance , 2011, Semantic Mining in Biomedicine.

[4]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[5]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[6]  Nello Cristianini,et al.  Flu Detector - Tracking Epidemics on Twitter , 2010, ECML/PKDD.

[7]  R. May,et al.  Population biology of infectious diseases: Part I , 1979, Nature.

[8]  E. David,et al.  Networks, Crowds, and Markets: Reasoning about a Highly Connected World , 2010 .

[9]  Ben Taskar,et al.  Introduction to statistical relational learning , 2007 .

[10]  O. Bjørnstad,et al.  Travelling waves and spatial hierarchies in measles epidemics , 2001, Nature.

[11]  Jure Leskovec,et al.  Supervised random walks: predicting and recommending links in social networks , 2010, WSDM '11.

[12]  Po-An Chen,et al.  Better vaccination strategies for better people , 2010, EC '10.

[13]  Henry A. Kautz,et al.  Modeling Spread of Disease from Social Interactions , 2012, ICWSM.

[14]  Jure Leskovec,et al.  Friendship and mobility: user movement in location-based social networks , 2011, KDD.

[15]  A. S. St Leger,et al.  Statistical Models in Epidemiology , 1994 .

[16]  Aravind Srinivasan,et al.  Modelling disease outbreaks in realistic urban social networks , 2004, Nature.

[17]  Isabell M. Welpe,et al.  Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment , 2010, ICWSM.

[18]  B. Wellman,et al.  Imagining Twitter as an Imagined Community , 2011 .

[19]  J. Snow On the Mode of Communication of Cholera , 1856, Edinburgh medical journal.

[20]  J. Brownstein,et al.  Social and news media enable estimation of epidemiological patterns early in the 2010 Haitian cholera outbreak. , 2012, The American journal of tropical medicine and hygiene.

[21]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[22]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[23]  Dan Cosley,et al.  Inferring social ties from geographic coincidences , 2010, Proceedings of the National Academy of Sciences.

[24]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[25]  R. May,et al.  Population biology of infectious diseases: Part II , 1979, Nature.

[26]  Mark Dredze,et al.  A Model for Mining , 2011 .

[27]  Thorsten Joachims,et al.  A support vector method for multivariate performance measures , 2005, ICML.

[28]  M. Newman Spread of epidemic disease on networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[29]  Bernardo A. Huberman,et al.  Predicting the Future with Social Media , 2010, Web Intelligence.

[30]  Emily H. Chan,et al.  Participatory Epidemiology: Use of Mobile Phones for Community-Based Health Reporting , 2010, PLoS medicine.