An Ensemble Deep Learning Model for Drug Abuse Detection in Sparse Twitter-Sphere

As the problem of drug abuse intensifies in the U.S., many studies that primarily utilize social media data, such as postings on Twitter, to study drug abuse-related activities use machine learning as a powerful tool for text classification and filtering. However, given the wide range of topics of Twitter users, tweets related to drug abuse are rare in most of the datasets. This imbalanced data remains a major issue in building effective tweet classifiers, and is especially obvious for studies that include abuse-related slang terms. In this study, we approach this problem by designing an ensemble deep learning model that leverages both word-level and character-level features to classify abuse-related tweets. Experiments are reported on a Twitter dataset, where we can configure the percentages of the two classes (abuse vs. non abuse) to simulate the data imbalance with different amplitudes. Results show that our ensemble deep learning models exhibit better performance than ensembles of traditional machine learning models, especially on heavily imbalanced datasets.

[1]  Valeri Craigle MedWatch–The FDA Safety Information and Adverse Event-Reporting Program , 2007, Red Book (2012).

[2]  Xin Li,et al.  Automatic Opioid User Detection from Twitter: Transductive Ensemble Built on Different Meta-graph Based Similarities over Heterogeneous Information Network , 2018, IJCAI.

[3]  Ian Portelli,et al.  Drug Use in the Twittersphere: A Qualitative Contextual Analysis of Tweets About Prescription Drugs , 2015, Journal of addictive diseases.

[4]  P. Seth,et al.  Quantifying the Epidemic of Prescription Opioid Overdose Deaths , 2018, American journal of public health.

[5]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[6]  Christophe Giraud-Carrier,et al.  Epidemiology from Tweets: Estimating Misuse of Prescription Opioids in the USA from Social Media , 2017, Journal of Medical Toxicology.

[7]  Michael D. Barnes,et al.  Tweaking and Tweeting: Exploring Twitter for Nonmedical Use of a Psychostimulant Drug (Adderall) Among College Students , 2013, Journal of medical Internet research.

[8]  Peter Brimblecombe,et al.  Monitoring the future , 2010 .

[9]  Tim Ken Mackey,et al.  Establishing a Link Between Prescription Drug Abuse and Illicit Online Pharmacies: Analysis of Twitter Data , 2015, Journal of medical Internet research.

[10]  E. LESTER SMITH,et al.  AND OTHERS , 2005 .

[11]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[12]  Elizabeth J D'Amico,et al.  Planting the seed for marijuana use: Changes in exposure to medical marijuana advertising and subsequent adolescent marijuana use, cognitions, and consequences over seven years. , 2018, Drug and alcohol dependence.

[13]  Michael D. Buhrmester,et al.  Amazon's Mechanical Turk , 2011, Perspectives on psychological science : a journal of the Association for Psychological Science.

[14]  John Halpin,et al.  Deaths Involving Fentanyl, Fentanyl Analogs, and U-47700 — 10 States, July–December 2016 , 2017, MMWR. Morbidity and mortality weekly report.

[15]  Fernando De la Torre,et al.  Facing Imbalanced Data--Recommendations for the Use of Performance Metrics , 2013, 2013 Humaine Association Conference on Affective Computing and Intelligent Interaction.

[16]  Benjamin Hansen,et al.  Early Evidence on Recreational Marijuana Legalization and Traffic Fatalities , 2018, Economic Inquiry.

[17]  Kevin A Hallgren,et al.  Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial. , 2012, Tutorials in quantitative methods for psychology.

[18]  Peter Frechtel,et al.  2011 NATIONAL SURVEY ON DRUG USE AND HEALTH , 2013 .

[19]  Yutaka Matsuo,et al.  Earthquake shakes Twitter users: real-time event detection by social sensors , 2010, WWW '10.

[20]  Katherine M Keyes,et al.  US Adult Illicit Cannabis Use, Cannabis Use Disorder, and Medical Marijuana Laws: 1991-1992 to 2012-2013 , 2017, JAMA psychiatry.