Identifying Personal Experience Tweets of Medication Effects Using Pre-trained RoBERTa Language Model and Its Updating

Post-market surveillance, the practice of monitoring the safe use of pharmaceutical drugs is an important part of pharmacovigilance. Being able to collect personal experience related to pharmaceutical product use could help us gain insight into how the human body reacts to different medications. Twitter, a popular social media service, is being considered as an important alternative data source for collecting personal experience information with medications. Identifying personal experience tweets is a challenging classification task in natural language processing. In this study, we utilized three methods based on Facebook’s Robustly Optimized BERT Pretraining Approach (RoBERTa) to predict personal experience tweets related to medication use: the first one combines the pre-trained RoBERTa model with a classifier, the second combines the updated pre-trained RoBERTa model using a corpus of unlabeled tweets with a classifier, and the third combines the RoBERTa model that was trained with our unlabeled tweets from scratch with the classifier too. Our results show that all of these approaches outperform the published methods (Word Embedding + LSTM) in classification performance (p < 0.05), and updating the pre-trained language model with tweets related to medications could even improve the performance further.

[1]  Walter Daelemans,et al.  Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 2014, EMNLP 2014.

[2]  Mark Dredze,et al.  Quantifying Mental Health Signals in Twitter , 2014, CLPsych@ACL.

[3]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[4]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[5]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6]  Christoph Lofi,et al.  Crowdsourcing Twitter annotations to identify first-hand experiences of prescription drug use , 2015, J. Biomed. Informatics.

[7]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[8]  Ophir Frieder,et al.  A framework for detecting public health trends with Twitter , 2013, 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2013).

[9]  Keyuan Jiang,et al.  Identifying tweets of personal health experience through word embedding and LSTM neural network , 2018, BMC Bioinformatics.

[10]  Olga Baysal,et al.  Mining Twitter Data for Influenza Detection and Surveillance , 2016, 2016 IEEE/ACM International Workshop on Software Engineering in Healthcare Systems (SEHS).

[11]  Christopher M. Danforth,et al.  Forecasting the onset and course of mental illness with Twitter data , 2016, Scientific Reports.

[12]  Keyuan Jiang,et al.  Deep gramulator: Improving precision in the classification of personal health-experience tweets with deep learning , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[13]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[14]  Alok N. Choudhary,et al.  Real-time disease surveillance using Twitter data: demonstration on flu and cancer , 2013, KDD.

[15]  Ireneus Kagashe,et al.  Enhancing Seasonal Influenza Surveillance: Topic Analysis of Widely Used Medicinal Drugs Using Twitter Data , 2017, Journal of medical Internet research.

[16]  Keyuan Jiang,et al.  Construction of a Personal Experience Tweet Corpus for Health Surveillance , 2016, BioNLP@ACL.

[17]  Christophe Giraud-Carrier,et al.  Using Twitter for breast cancer prevention: an analysis of breast cancer awareness month , 2013, BMC Cancer.

[18]  Keyuan Jiang,et al.  Mining Twitter Data for Potential Drug Effects , 2013, ADMA.

[19]  Keyuan Jiang,et al.  Assessment of Word Embedding Techniques for Identification of Personal Experience Tweets Pertaining to Medication Uses , 2019, Precision Health and Medicine.

[20]  Mark Dredze,et al.  Worldwide Influenza Surveillance through Twitter , 2015, AAAI Workshop: WWW and Public Health Intelligence.

[21]  Paola Velardi,et al.  Influenza-Like Illness Surveillance on Twitter through Automated Learning of Naïve Language , 2013, PloS one.

[22]  N. Heaivilin,et al.  Public Health Surveillance of Dental Pain via Twitter , 2011, Journal of dental research.

[23]  Soon Ae Chun,et al.  Epidemic Outbreak and Spread Detection System Based on Twitter Data , 2012, HIS.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.