Neural attention with character embeddings for hay fever detection from twitter

The paper aims to leverage the highly unstructured user-generated content in the context of pollen allergy surveillance using neural networks with character embeddings and the attention mechanism. Currently, there is no accurate representation of hay fever prevalence, particularly in real-time scenarios. Social media serves as an alternative to extract knowledge about the condition, which is valuable for allergy sufferers, general practitioners, and policy makers. Despite tremendous potential offered, conventional natural language processing methods prove limited when exposed to the challenging nature of user-generated content. As a result, the detection of actual hay fever instances among the number of false positives, as well as the correct identification of non-technical expressions as pollen allergy symptoms poses a major problem. We propose a deep architecture enhanced with character embeddings and neural attention to improve the performance of hay fever-related content classification from Twitter data. Improvement in prediction is achieved due to the character-level semantics introduced, which effectively addresses the out-of-vocabulary problem in our dataset where the rate is approximately 9%. Overall, the study is a step forward towards improved real-time pollen allergy surveillance from social media with state-of-art technology.

[1]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[2]  Xiang Zhang,et al.  Text Understanding from Scratch , 2015, ArXiv.

[3]  A. SalloumSaid,et al.  A survey of text mining in social media facebook and twitter perspectives , 2017 .

[4]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[5]  A. Viera,et al.  Understanding interobserver agreement: the kappa statistic. , 2005, Family medicine.

[6]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[7]  Hua Wang,et al.  An integrated model for next page access prediction , 2009, Int. J. Knowl. Web Intell..

[8]  Rudy Arthur,et al.  @choo: Tracking Pollen and Hayfever in the UK Using Social Media , 2018, Sensors.

[9]  Hua Wang,et al.  Personalized app recommendation based on app permissions , 2017, World Wide Web.

[10]  Aron Culotta,et al.  Estimating county health statistics with twitter , 2014, CHI.

[11]  Olga Baysal,et al.  Mining Twitter Data for Influenza Detection and Surveillance , 2016, 2016 IEEE/ACM International Workshop on Software Engineering in Healthcare Systems (SEHS).

[12]  Yanchun Zhang,et al.  Supervised Anomaly Detection in Uncertain Pseudoperiodic Data Streams , 2016, ACM Trans. Internet Techn..

[13]  Sinthia Bosnic-Anticevich,et al.  Tell me about your hay fever: a qualitative investigation of allergic rhinitis management from the perspective of the patient , 2018, npj Primary Care Respiratory Medicine.

[14]  Hua Wang,et al.  Combined Gene Selection Methods for Microarray Data Analysis , 2006, KES.

[15]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[16]  Jiuyong Li,et al.  Integrating Markov Model with clustering for predicting web page accesses , 2007 .

[17]  Ed de Quincey,et al.  Potential of Social Media to Determine Hay Fever Seasons and Drug Efficacy , 2014 .

[18]  Paola Velardi,et al.  Can Twitter Be a Source of Information on Allergy? Correlation of Pollen Counts with Tweets Reporting Symptoms of Allergic Rhinoconjunctivitis and Names of Antihistamine Drugs , 2015, PloS one.

[19]  Yanchun Zhang,et al.  Deep Learning for Multi-Class Identification From Domestic Violence Online Posts , 2019, IEEE Access.

[20]  Tong Zhang,et al.  Effective Use of Word Order for Text Categorization with Convolutional Neural Networks , 2014, NAACL.

[21]  Ji Zhang,et al.  Outlier detection from large distributed databases , 2013, World Wide Web.

[22]  Alok N. Choudhary,et al.  Mining social media streams to improve public health allergy surveillance , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[23]  Shang Gao,et al.  Hierarchical attention networks for information extraction from cancer pathology reports , 2017, J. Am. Medical Informatics Assoc..

[24]  Mark Dredze,et al.  Quantifying Mental Health Signals in Twitter , 2014, CLPsych@ACL.

[25]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[26]  Bin Zhou,et al.  Multi-window based ensemble learning for classification of imbalanced streaming data , 2015, World Wide Web.

[27]  Theocharis Kyriacou,et al.  #hayfever; A Longitudinal Study into Hay Fever Related Tweets in the UK , 2016, Digital Health.

[28]  Jean Carletta,et al.  Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[29]  J. Bell,et al.  Medications and Prescribing Patterns as Factors Associated with Hospitalizations from Long-Term Care Facilities: A Systematic Review , 2018, Drugs & Aging.

[30]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[31]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[32]  J. Pennebaker,et al.  Who talks? The social psychology of illness support groups. , 2000, The American psychologist.

[33]  Timothy W. Finin,et al.  Why we twitter: understanding microblogging usage and communities , 2007, WebKDD/SNA-KDD '07.

[34]  Xiuzhen Zhang,et al.  A probabilistic method for emerging topic tracking in Microblog stream , 2016, World Wide Web.