Towards the Semantic Interpretation of Personal Health Messages from Social Media

Recent attempts have been made to utilise social media platforms, such as Twitter, to provide early warning and monitoring of health threats in populations (i.e. Internet bio-surveillance). It has been shown in the literature that a system based on keyword matching that exploits social media messages could report flu surveillance well ahead of the Centers of Disease Control and Prevention (CDC). However, we argue that a simple keyword matching may not capture semantic interpretation of social media messages that would enable healthcare experts or machines to extract and leverage medical knowledge from social media messages. In this paper, we motivate and describe a new task that aims to tackle this technology gap by extracting semantic interpretation of medical terms mentioned in social media messages, which are typically written in layman's language. Achieving such a task would enable an automatic integration between the data about direct patient experiences extracted from social media and existing knowledge from clinical databases, which leads to advances in the use of community health experiences in healthcare services.

[1]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[2]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[3]  Craig MacDonald,et al.  Scalable distributed event detection for Twitter , 2013, 2013 IEEE International Conference on Big Data.

[4]  Xiaoyan Wang,et al.  Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. , 2009, Journal of the American Medical Informatics Association : JAMIA.

[5]  Li Wang,et al.  How Noisy Social Media Text, How Diffrnt Social Media Sources? , 2013, IJCNLP.

[6]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[7]  Alexander A. Morgan,et al.  BioCreAtIvE Task 1A: gene mention finding evaluation , 2005, BMC Bioinformatics.

[8]  Alan R. Aronson,et al.  An overview of MetaMap: historical perspective and recent advances , 2010, J. Am. Medical Informatics Assoc..

[9]  J S Brownstein,et al.  An overview of internet biosurveillance. , 2013, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[10]  Peter L. Elkin,et al.  Comparison of Natural Language Processing Biosurveillance Methods for Identifying Influenza From Encounter Notes , 2012, Annals of Internal Medicine.

[11]  Eduard H. Hovy,et al.  Structured Event Retrieval over Microblog Archives , 2012, NAACL.

[12]  Sanna Salanterä,et al.  Overview of the ShARe/CLEF eHealth Evaluation Lab 2013 , 2013, CLEF.

[13]  Yanjun Qi,et al.  Retrieving Medical Records with "sennamed": NEC Labs America at TREC 2012 Medical Record Track , 2012, TREC.

[14]  C. Schmidt Trending Now: Using Social Media to Predict and Track Disease Outbreaks , 2012, Environmental health perspectives.

[15]  Daniel Marcu,et al.  Statistical Phrase-Based Translation , 2003, NAACL.

[16]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[17]  Iadh Ounis,et al.  Tweeting Behaviour during Train Disruptions within a City , 2015 .

[18]  Nigel Collier,et al.  Adapting Phrase-based Machine Translation to Normalise Medical Terms in Social Media Messages , 2015, EMNLP.

[19]  Robert Pless,et al.  Learning from Outdoor Webcams: Surveillance of Physical Activity Across Environments , 2016 .

[20]  Shuying Shen,et al.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text , 2011, J. Am. Medical Informatics Assoc..

[21]  Dario A. Giuse,et al.  Development and evaluation of RapTAT: A machine learning system for concept mapping of phrases from medical narratives , 2014, J. Biomed. Informatics.

[22]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[23]  Craig MacDonald,et al.  Inferring conceptual relationships to improve medical records search , 2013, OAIR.

[24]  Nick Craswell Mean Reciprocal Rank , 2009, Encyclopedia of Database Systems.

[25]  Vern Paxson,et al.  @spam: the underground on 140 characters or less , 2010, CCS '10.

[26]  Craig MacDonald,et al.  A Task-Specific Query and Document Representation for Medical Records Search , 2013, ECIR.

[27]  Meredith A Barrett,et al.  Big Data and Disease Prevention: From Quantified Self to Quantified Communities , 2013, Big Data.

[28]  Sunghwan Sohn,et al.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications , 2010, J. Am. Medical Informatics Assoc..

[29]  Marko Jurmu,et al.  This is not classified: everyday information seeking and encountering in smart urban spaces , 2011, Personal and Ubiquitous Computing.

[30]  Jeffrey Pennington,et al.  Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions , 2011, EMNLP.

[31]  Andrew McCallum,et al.  Lexicon Infused Phrase Embeddings for Named Entity Resolution , 2014, CoNLL.

[32]  Graciela Gonzalez-Hernandez,et al.  Pharmacovigilance on Twitter? Mining Tweets for Adverse Drug Reactions , 2014, AMIA.

[33]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[34]  James Allan,et al.  Incremental relevance feedback for information filtering , 1996, SIGIR '96.

[35]  Hermann Ney,et al.  The Alignment Template Approach to Statistical Machine Translation , 2004, CL.

[36]  Kent A. Spackman,et al.  SNOMED RT: a reference terminology for health care , 1997, AMIA.

[37]  Graciela Gonzalez,et al.  Phonetic Spelling Filter for Keyword Selection in Drug Mention Mining from Social Media , 2014, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.