Deep learning for pollen allergy surveillance from twitter in Australia

The paper introduces a deep learning-based approach for real-time detection and insights generation about one of the most prevalent chronic conditions in Australia - Pollen allergy. The popular social media platform is used for data collection as cost-effective and unobtrusive alternative for public health monitoring to complement the traditional survey-based approaches. The data was extracted from Twitter based on pre-defined keywords (i.e. ’hayfever’ OR ’hay fever’) throughout the period of 6 months, covering the high pollen season in Australia. The following deep learning architectures were adopted in the experiments: CNN, RNN, LSTM and GRU. Both default (GloVe) and domain-specific (HF) word embeddings were used in training the classifiers. Standard evaluation metrics (i.e. Accuracy, Precision and Recall) were calculated for the results validation. Finally, visual correlation with weather variables was performed. The neural networks-based approach was able to correctly identify the implicit mentions of the symptoms and treatments, even unseen previously (accuracy up to 87.9% for GRU with GloVe embeddings of 300 dimensions). The system addresses the shortcomings of the conventional machine learning techniques with manual feature-engineering that prove limiting when exposed to a wide range of non-standard expressions relating to medical concepts. The case-study presented demonstrates an application of ’black-box’ approach to the real-world problem, along with its internal workings demonstration towards more transparent, interpretable and reproducible decision-making in health informatics domain.

[1]  A. Viera,et al.  Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.

[2]  Abeed Sarker,et al.  Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features , 2015, J. Am. Medical Informatics Assoc..

[3]  Paola Velardi,et al.  Twitter mining for fine-grained syndromic surveillance , 2014, Artif. Intell. Medicine.

[4]  J. Pennebaker,et al.  Who talks? The social psychology of illness support groups. , 2000, The American psychologist.

[5]  Hassan Sajjad,et al.  Robust Classification of Crisis-Related Data on Social Networks Using Convolutional Neural Networks , 2017, ICWSM.

[6]  Rudy Arthur,et al.  @choo: Tracking Pollen and Hayfever in the UK Using Social Media , 2018, Sensors.

[7]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[8]  Sinthia Bosnic-Anticevich,et al.  Tell me about your hay fever: a qualitative investigation of allergic rhinitis management from the perspective of the patient , 2018, npj Primary Care Respiratory Medicine.

[9]  Erik Cambria,et al.  Aspect extraction for opinion mining with a deep convolutional neural network , 2016, Knowl. Based Syst..

[10]  Theocharis Kyriacou,et al.  #hayfever; A Longitudinal Study into Hay Fever Related Tweets in the UK , 2016, Digital Health.

[11]  Hilde van der Togt,et al.  Publisher's Note , 2003, J. Netw. Comput. Appl..

[12]  Hassan Sajjad,et al.  Rapid Classification of Crisis-Related Data on Social Networks using Convolutional Neural Networks , 2016, ICWSM 2016.

[13]  E. Larson,et al.  Dissemination of health information through social networks: twitter and antibiotics. , 2010, American journal of infection control.

[14]  Paola Velardi,et al.  Can Twitter Be a Source of Information on Allergy? Correlation of Pollen Counts with Tweets Reporting Symptoms of Allergic Rhinoconjunctivitis and Names of Antihistamine Drugs , 2015, PloS one.

[15]  Aron Culotta,et al.  Towards detecting influenza epidemics by analyzing Twitter messages , 2010, SOMA '10.

[16]  Alok N. Choudhary,et al.  Mining social media streams to improve public health allergy surveillance , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[17]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[18]  Shang Gao,et al.  Hierarchical attention networks for information extraction from cancer pathology reports , 2017, J. Am. Medical Informatics Assoc..

[19]  Olga Baysal,et al.  Mining Twitter Data for Influenza Detection and Surveillance , 2016, 2016 IEEE/ACM International Workshop on Software Engineering in Healthcare Systems (SEHS).

[20]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[21]  Azadeh Nikfarjam,et al.  Mining Twitter for Adverse Drug Reaction Mentions : A Corpus and Classification Benchmark , 2014 .

[22]  Jian Yang,et al.  Towards Internet-Age Pharmacovigilance: Extracting Adverse Drug Reactions from User Posts in Health-Related Social Networks , 2010, BioNLP@ACL.

[23]  Hua Wang,et al.  Text Mining and Real-Time Analytics of Twitter Data: A Case Study of Australian Hay Fever Prediction , 2018, HIS.

[24]  Ed de Quincey,et al.  Potential of Social Media to Determine Hay Fever Seasons and Drug Efficacy , 2014 .

[25]  JITENDRA JONNAGADDALA,et al.  BINARY CLASSIFICATION OF TWITTER POSTS FOR ADVERSE DRUG REACTIONS , 2015 .

[26]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[27]  Chris Hankin,et al.  Real-time processing of social media with SENTINEL: A syndromic surveillance system incorporating deep learning for health classification , 2019, Inf. Process. Manag..

[28]  Kar-Hai Chu,et al.  Toward Real-Time Infoveillance of Twitter Health Messages. , 2018, American journal of public health.

[29]  Christoph Goller,et al.  Learning task-dependent distributed representations by backpropagation through structure , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[30]  Marie Lindquist,et al.  Social Media and Networks in Pharmacovigilance , 2011, Drug safety.

[31]  Erik Cambria,et al.  Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[32]  Todor A Popov,et al.  Meteorological conditions, climate change, new emerging factors, and asthma and related allergic disorders. A statement of the World Allergy Organization , 2015, The World Allergy Organization journal.

[33]  Erik Cambria,et al.  Deep Learning-Based Document Modeling for Personality Detection from Text , 2017, IEEE Intelligent Systems.

[34]  Jonathan A. Patz,et al.  Recent warming by latitude associated with increased length of ragweed pollen season in central North America , 2011, Proceedings of the National Academy of Sciences.

[35]  Weiguo Fan,et al.  A Deep Learning Based Named Entity Recognition Approach for Adverse Drug Events Identification and Extraction in Health Social Media , 2017, ICSH.

[36]  Marcel Salathé,et al.  An ensemble heterogeneous classification methodology for discovering health-related knowledge in social media messages , 2014, J. Biomed. Informatics.

[37]  Hong-Jie Dai,et al.  Using a Recurrent Neural Network Model for Classification of Tweets Conveyed Influenza-related Information , 2017, DDDSM@IJCNLP.

[38]  Caroline Jay,et al.  Britain Breathing: using the experience sampling method to collect the seasonal allergy symptoms of a country , 2017, J. Am. Medical Informatics Assoc..

[39]  Jürgen Schmidhuber,et al.  Learning to Forget: Continual Prediction with LSTM , 2000, Neural Computation.

[40]  Erik Cambria,et al.  A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks , 2016, COLING.

[41]  Abeed Sarker,et al.  Portable automatic text classification for adverse drug reaction detection via multi-corpus training , 2015, J. Biomed. Informatics.