Extracting health-related causality from twitter messages using natural language processing

BackgroundTwitter messages (tweets) contain various types of topics in our daily life, which include health-related topics. Analysis of health-related tweets would help us understand health conditions and concerns encountered in our daily lives. In this paper we evaluate an approach to extracting causalities from tweets using natural language processing (NLP) techniques.MethodsLexico-syntactic patterns based on dependency parser outputs are used for causality extraction. We focused on three health-related topics: “stress”, “insomnia”, and “headache.” A large dataset consisting of 24 million tweets are used.ResultsThe results show the proposed approach achieved an average precision between 74.59 to 92.27% in comparisons with human annotations.ConclusionsManual analysis on extracted causalities in tweets reveals interesting findings about expressions on health-related topic posted by Twitter users.

[1]  Dan I. Moldovan,et al.  Causal Relation Extraction , 2008, LREC.

[2]  Nigel Collier,et al.  OMG U got flu? Analysis of shared health messages for bio-surveillance , 2011, Semantic Mining in Biomedicine.

[3]  Eric Yeh,et al.  Learning Alignments and Leveraging Natural Logic , 2007, ACL-PASCAL@ACL.

[4]  Sanda M. Harabagiu,et al.  Learning Textual Graph Patterns to Detect Causal Event Relations , 2010, FLAIRS.

[5]  Mark Dredze,et al.  Vaccine Images on Twitter: Analysis of What Images are Shared , 2018, Journal of medical Internet research.

[6]  Gosse Bouma,et al.  Extracting Explicit and Implicit Causal Relations from Sparse, Domain-Specific Texts , 2011, NLDB.

[7]  H. Sueki,et al.  The association of suicide-related Twitter use with suicidal behaviour: a cross-sectional study of young internet users in Japan. , 2015, Journal of affective disorders.

[8]  Zornitsa Kozareva,et al.  SemEval-2012 Task 7: Choice of Plausible Alternatives: An Evaluation of Commonsense Causal Reasoning , 2011, *SEMEVAL.

[9]  J. Brownstein,et al.  Characterizing Sleep Issues Using Twitter , 2015, Journal of medical Internet research.

[10]  Son Doan,et al.  How Do You #relax When You’re #stressed? A Content Analysis and Infodemiology Study of Stress-Related Tweets , 2017, JMIR public health and surveillance.

[11]  Gosse Bouma,et al.  Minimally-supervised learning of domain-specific causal relations using an open-domain corpus as knowledge base , 2013, Data Knowl. Eng..

[12]  Jiyeon So,et al.  What Do People Like to “Share” About Obesity? A Content Analysis of Frequent Retweets About Obesity on Twitter , 2016, Health communication.

[13]  Alexander F. Gelbukh,et al.  An Open-Domain Cause-Effect Relation Detection from Paired Nominals , 2014, MICAI.

[14]  Kezhi Mao,et al.  Multi level causal relation identification using extended features , 2014, Expert Syst. Appl..

[15]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[16]  Son Doan,et al.  Syndromic Classification of Twitter Messages , 2011, eHealth.

[17]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[18]  J.B. Bowles,et al.  A Lightweight Tool for Automatically Extracting Causal Relationships from Text , 2006, Proceedings of the IEEE SoutheastCon 2006.

[19]  Nabiha Asghar,et al.  Automatic Extraction of Causal Relations from Natural Language Texts: A Comprehensive Survey , 2016, ArXiv.

[20]  Joakim Nivre,et al.  Universal Dependency Annotation for Multilingual Parsing , 2013, ACL.

[21]  Dieter Pfoser,et al.  Zika in Twitter: Temporal Variations of Locations, Actors, and Concepts , 2017, JMIR public health and surveillance.

[22]  Graciela Gonzalez-Hernandez,et al.  Pharmacovigilance on Twitter? Mining Tweets for Adverse Drug Reactions , 2014, AMIA.

[23]  Andrea K. Wittenborn,et al.  #MyDepressionLooksLike: Examining Public Discourse About Depression on Twitter , 2017, JMIR mental health.

[24]  Christopher M. Danforth,et al.  Forecasting the onset and course of mental illness with Twitter data , 2016, Scientific Reports.

[25]  Preslav Nakov,et al.  Classification of semantic relations between nominals , 2009, Lang. Resour. Evaluation.

[26]  Dan I. Moldovan,et al.  Text Mining for Causal Relations , 2002, FLAIRS.

[27]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[28]  Eric Horvitz,et al.  Social media as a measurement tool of depression in populations , 2013, WebSci.

[29]  Pete Burnap,et al.  Multi-class machine classification of suicide-related communication on Twitter , 2017, Online Soc. Networks Media.

[30]  Glen A. Coppersmith,et al.  Understanding Depressive Symptoms and Psychosocial Stressors on Twitter: A Corpus-Based Study , 2017, Journal of medical Internet research.

[31]  Sue Jamison-Powell,et al.  "I can't get no sleep": discussing #insomnia on twitter , 2012, CHI.

[32]  Anne Cocos,et al.  Deep learning for pharmacovigilance: recurrent neural network architectures for labeling adverse drug reactions in Twitter posts , 2017, J. Am. Medical Informatics Assoc..

[33]  Michael J. Paul,et al.  Twitter Improves Influenza Forecasting , 2014, PLoS currents.

[34]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[35]  Syin Chan,et al.  Extracting Causal Knowledge from a Medical Database Using Graphical Patterns , 2000, ACL.

[36]  Roxana Gîrju,et al.  Automatic Detection of Causal Relations for Question Answering , 2003, ACL 2003.

[37]  W. Chapman,et al.  Using Twitter to Examine Smoking Behavior and Perceptions of Emerging Tobacco Products , 2013, Journal of medical Internet research.