Health Monitoring on Social Media over Time

Social media has become a major source for analyzing all aspects of daily life. Thanks to dedicated latent topic analysis methods such as the Ailment Topic Aspect Model (ATAM), public health can now be observed on Twitter. In this work, we are interested in using social media to monitor people’s health over time. The use of tweets has several benefits including instantaneous data availability at virtually no cost. Early monitoring of health data is complementary to post-factum studies and enables a range of applications such as measuring behavioral risk factors and triggering health campaigns. We formulate two problems: health transition detection and health transition prediction. We first propose the Temporal Ailment Topic Aspect Model (TM–ATAM), a new latent model dedicated to solving the first problem by capturing transitions that involve health-related topics. TM–ATAM is a non-obvious extension to ATAM that was designed to extract health-related topics. It learns health-related topic transitions by minimizing the prediction error on topic distributions between consecutive posts at different time and geographic granularities. To solve the second problem, we develop T–ATAM, a Temporal Ailment Topic Aspect Model where time is treated as a random variable natively inside ATAM. Our experiments on an 8-month corpus of tweets show that TM–ATAM outperforms TM–LDA in estimating health-related transitions from tweets for different geographic populations. We examine the ability of TM–ATAM to detect transitions due to climate conditions in different geographic regions. We then show how T–ATAM can be used to predict the most important transition and additionally compare T–ATAM with CDC (Center for Disease Control) data and Google Flu Trends.

[1]  Jean-Marc Odobez,et al.  A Sequential Topic Model for Mining Recurrent Activities from Long Term Video Logs , 2013, International Journal of Computer Vision.

[2]  Tiejun Zhao,et al.  Target-dependent Twitter Sentiment Classification , 2011, ACL.

[3]  Aron Culotta,et al.  Estimating county health statistics with twitter , 2014, CHI.

[4]  Jiawei Han,et al.  The Joint Inference of Topic Diffusion and Evolution in Social Communities , 2011, 2011 IEEE 11th International Conference on Data Mining.

[5]  Ruslan Salakhutdinov,et al.  Evaluation methods for topic models , 2009, ICML '09.

[6]  Sihem Amer-Yahia,et al.  Tweet4act: Using incident-specific profiles for classifying crisis-related messages , 2013, ISCRAM.

[7]  Enrique Castro-Sánchez,et al.  What makes people talk about antibiotics on social media? A retrospective analysis of Twitter use , 2014, The Journal of antimicrobial chemotherapy.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[10]  Munmun De Choudhury,et al.  Anorexia on Tumblr: A Characterization Study , 2015, Digital Health.

[11]  Munmun De Choudhury,et al.  "Narco" emotions: affect and desensitization in social media during the mexican drug war , 2014, CHI.

[12]  Sabine Loudcher,et al.  A Joint Model for Topic-Sentiment Evolution over Time , 2014, 2014 IEEE International Conference on Data Mining.

[13]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[14]  Mark Dredze,et al.  Exploring Health Topics in Chinese Social Media: An Analysis of Sina Weibo , 2014, AAAI 2014.

[15]  Bernard J. Jansen,et al.  Twitter power: Tweets as electronic word of mouth , 2009, J. Assoc. Inf. Sci. Technol..

[16]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[17]  Vikas Sindhwani,et al.  Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization , 2012, WSDM '12.

[18]  Wolfgang Nejdl,et al.  Understanding the diversity of tweets in the time of outbreaks , 2013, WWW.

[19]  Ramesh Nallapati,et al.  Joint latent topic models for text and citations , 2008, KDD.

[20]  Sihem Amer-Yahia,et al.  Health Monitoring on Social Media over Time , 2018, IEEE Trans. Knowl. Data Eng..

[21]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[22]  Munmun De Choudhury,et al.  Identity Management and Mental Health Discourse in Social Media , 2015, WWW.

[23]  Eugene Agichtein,et al.  TM-LDA: efficient online modeling of latent topic transitions in social media , 2012, KDD.

[24]  Pablo Barberá Birds of the Same Feather Tweet Together: Bayesian Ideal Point Estimation Using Twitter Data , 2015, Political Analysis.

[25]  Libby Hemphill,et al.  Tweet acts: how constituents lobby congress via Twitter , 2014, CSCW.

[26]  Hongfei Yan,et al.  Comparing Twitter and Traditional Media Using Topic Models , 2011, ECIR.

[27]  Stan Matwin,et al.  French presidential elections: what are the most efficient measures for tweets? , 2012, PLEAD '12.

[28]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[29]  Liangjie Hong,et al.  A time-dependent topic model for multiple text streams , 2011, KDD.

[30]  Padhraic Smyth,et al.  Modeling General and Specific Aspects of Documents with a Probabilistic Topic Model , 2006, NIPS.

[31]  S. Iacus,et al.  Using Sentiment Analysis to Monitor Electoral Campaigns , 2015 .

[32]  Christophe G. Giraud-Carrier,et al.  Identifying Health-Related Topics on Twitter - An Exploration of Tobacco-Related Tweets as a Test Topic , 2011, SBP.

[33]  Munmun De Choudhury,et al.  Modeling and Understanding Visual Attributes of Mental Health Disclosures in Social Media , 2017, CHI.

[34]  Ingmar Weber,et al.  Automated Hate Speech Detection and the Problem of Offensive Language , 2017, ICWSM.

[35]  Brian D. Davison,et al.  Empirical study of topic modeling in Twitter , 2010, SOMA '10.