Survey of Text-based Epidemic Intelligence: A Computational Linguistic Perspective

Epidemic intelligence deals with the detection of disease outbreaks using formal (such as hospital records) and informal sources (such as user-generated text on the web) of information. In this survey, we discuss approaches for epidemic intelligence that use textual datasets, referring to it as `text-based epidemic intelligence'. We view past work in terms of two broad categories: health mention classification (selecting relevant text from a large volume) and health event detection (predicting epidemic events from a collection of relevant text). The focus of our discussion is the underlying computational linguistic techniques in the two categories. The survey also provides details of the state-of-the-art in annotation techniques, resources and evaluation strategies for epidemic intelligence.

[1]  D. Lindberg,et al.  The Unified Medical Language System , 1993, Methods of Information in Medicine.

[2]  A. Chughtai,et al.  Utility and potential of rapid epidemic intelligence from internet-based sources. , 2017, International journal of infectious diseases : IJID : official publication of the International Society for Infectious Diseases.

[3]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[4]  Zion Tsz Ho Tse,et al.  The use of social media in public health surveillance. , 2015, Western Pacific surveillance and response journal : WPSAR.

[5]  Pushpak Bhattacharyya,et al.  Sentiment Resources: Lexicons and Datasets , 2017 .

[6]  Marijke Welvaert,et al.  Limits of use of social media for monitoring biosecurity events , 2017, PloS one.

[7]  Robert Power,et al.  An investigation into social media syndromic monitoring , 2017, Commun. Stat. Simul. Comput..

[8]  J. Crilly,et al.  Prediction and surveillance of influenza epidemics , 2011, The Medical journal of Australia.

[9]  Graciela Gonzalez-Hernandez,et al.  Utilizing social media data for pharmacovigilance: A review , 2015, J. Biomed. Informatics.

[10]  Peter J. Haug,et al.  Classifying free-text triage chief complaints into syndromic categories with natural language processing , 2005, Artif. Intell. Medicine.

[11]  Michael M. Wagner,et al.  Handbook of biosurveillance , 2006 .

[12]  Ingemar J. Cox,et al.  Enhancing Feature Selection Using Word Embeddings: The Case of Flu Surveillance , 2017, WWW.

[13]  Ross Sparks,et al.  Exponentially weighted moving average plans for detecting unusual negative binomial counts , 2010 .

[14]  M. Shigematsu,et al.  Using Social Media for Actionable Disease Surveillance and Outbreak Management: A Systematic Literature Review , 2015, PloS one.

[15]  Hinrich Schütze,et al.  Book Reviews: Foundations of Statistical Natural Language Processing , 1999, CL.

[16]  Mark Dredze,et al.  Exploring Health Topics in Chinese Social Media: An Analysis of Sina Weibo , 2014, AAAI 2014.

[17]  Umar Saif,et al.  FluBreaks: early epidemic detection from Google flu trends. , 2012, Journal of medical Internet research.

[18]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19]  Marwan Bikdash,et al.  From social media to public health surveillance: Word embedding based clustering method for twitter classification , 2017, SoutheastCon 2017.

[20]  Cécile Paris,et al.  We Feel: Mapping Emotion on Twitter , 2015, IEEE Journal of Biomedical and Health Informatics.

[21]  Ross Sparks,et al.  Understanding sources of variation in syndromic surveillance for early warning of natural or intentional disease outbreaks , 2010 .

[22]  Antonio Jimeno-Yepes,et al.  Investigating Public Health Surveillance using Twitter , 2015, BioNLP@IJCNLP.

[23]  Abeed Sarker,et al.  Social Media Mining Shared Task Workshop , 2016, PSB.

[24]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[25]  Jeremy Ginsberg,et al.  Detecting influenza epidemics using search engine query data , 2009, Nature.

[26]  Howard S. Burkom,et al.  A practitioner-driven research agenda for syndromic surveillance , 2017, Public health reports.

[27]  Kenneth D. Mandl,et al.  HealthMap: Global Infectious Disease Monitoring through Automated Classification and Visualization of Internet Media Reports , 2008, Journal of the American Medical Informatics Association.

[28]  Cécile Paris,et al.  Text and Data Mining Techniques in Adverse Drug Reaction Detection , 2015, ACM Comput. Surv..

[29]  Alexander Rosewell,et al.  Mobile Phone–based Syndromic Surveillance System, Papua New Guinea , 2013, Emerging infectious diseases.

[30]  Eiji Aramaki,et al.  Forecasting Word Model: Twitter-based Influenza Surveillance and Prediction , 2016, COLING.

[31]  Michael J. Paul,et al.  Overview of the Third Social Media Mining for Health (SMM4H) Shared Tasks at EMNLP 2018 , 2018, EMNLP 2018.

[32]  Jianxin Li,et al.  An Efficient Approach to Event Detection and Forecasting in Dynamic Multivariate Social Media Networks , 2017, WWW.

[33]  Naoaki Okazaki,et al.  Who caught a cold ? - Identifying the subject of a symptom , 2015, ACL.

[34]  Karin M. Verspoor,et al.  Syndromic Surveillance through Measuring Lexical Shift in Emergency Department Chief Complaint Texts , 2016, ALTA.

[35]  A. Hulth,et al.  Web Queries as a Source for Syndromic Surveillance , 2009, PloS one.

[36]  T. Bernardo,et al.  Scoping Review on Search Queries and Social Media for Disease Surveillance: A Chronology of Innovation , 2013, Journal of medical Internet research.

[37]  Zhiyong Lu,et al.  Exploring Two Biomedical Text Genres for Disease Recognition , 2009, BioNLP@HLT-NAACL.

[38]  Mike Conway,et al.  Developing an application ontology for mining free text clinical reports: The extended syndromic surveillance ontology , 2010 .

[39]  David L. Buckeridge,et al.  Ontology-centered syndromic surveillance for bioterrorism , 2005, IEEE Intelligent Systems.

[40]  Ingemar J. Cox,et al.  Multi-Task Learning Improves Disease Models from Web Search , 2018, WWW.

[41]  Keyuan Jiang,et al.  Construction of a Personal Experience Tweet Corpus for Health Surveillance , 2016, BioNLP@ACL.

[42]  Antonio Jimeno-Yepes,et al.  Syndromic Surveillance using Generic Medical Entities on Twitter , 2016, ALTA.

[43]  Eugene Agichtein,et al.  Did You Really Just Have a Heart Attack?: Towards Robust Detection of Personal Health Mentions in Social Media , 2018, WWW.

[44]  Thomas R. Gruber,et al.  A translation approach to portable ontology specifications , 1993, Knowl. Acquis..

[45]  Son Doan,et al.  An ontology-driven system for detecting global health events , 2010, COLING.

[46]  Robert T. Olszewski Bayesian Classification of Triage Diagnoses for the Early Detection of Epidemics , 2003, FLAIRS.

[47]  Chang-Gun Lee,et al.  Estimating Influenza Outbreaks Using Both Search Engine Query Data and Social Media Data in South Korea , 2016, Journal of medical Internet research.

[48]  Hsinchun Chen,et al.  A Review of Public Health Syndromic Surveillance Systems , 2006, ISI.

[49]  Régis Duvauferrier,et al.  Ontology and medical diagnosis , 2012, Informatics for health & social care.

[50]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[51]  Ophir Frieder,et al.  A Framework for Public Health Surveillance , 2014, LREC.

[52]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[53]  Son Doan,et al.  Global Health Monitor - A Web-based System for Detecting and Mapping Infectious Diseases , 2019, IJCNLP.

[54]  Antoine Doucet,et al.  Filtering news for epidemic surveillance: towards processing more languages with fewer resources , 2010 .

[55]  Karin M. Verspoor,et al.  Towards Early Discovery of Salient Health Threats: A Social Media Emotion Classification Technique , 2016, PSB.

[56]  Mike Conway,et al.  Using chief complaints for syndromic surveillance: A review of chief complaint based classifiers in North America , 2013, J. Biomed. Informatics.

[57]  Mark Dredze,et al.  Ethical Research Protocols for Social Media Health Research , 2017, EthNLP@EACL.

[58]  Hsinchun Chen,et al.  Multilingual chief complaint classification for syndromic surveillance: An experiment with Chinese chief complaints , 2008, International Journal of Medical Informatics.

[59]  Hong-Jie Dai,et al.  Using a Recurrent Neural Network Model for Classification of Tweets Conveyed Influenza-related Information , 2017, DDDSM@IJCNLP.

[60]  K. Denecke,et al.  Social Media and Internet-Based Data in Global Systems for Public Health Surveillance: A Systematic Review , 2014, The Milbank quarterly.

[61]  Naren Ramakrishnan,et al.  Syndromic surveillance of Flu on Twitter using weakly supervised temporal topic models , 2016, Data Mining and Knowledge Discovery.