A Machine Learning Approach for the Detection and Characterization of Illicit Drug Dealers on Instagram: Model Evaluation Study

Background Social media use is now ubiquitous, but the growth in social media communications has also made it a convenient digital platform for drug dealers selling controlled substances, opioids, and other illicit drugs. Previous studies and news investigations have reported the use of popular social media platforms as conduits for opioid sales. This study uses deep learning to detect illicit drug dealing on the image and video sharing platform Instagram. Objective The aim of this study was to develop and evaluate a machine learning approach to detect Instagram posts related to illegal internet drug dealing. Methods In this paper, we describe an approach to detect drug dealers by using a deep learning model on Instagram. We collected Instagram posts using a Web scraper between July 2018 and October 2018 and then compared our deep learning model against 3 different machine learning models (eg, random forest, decision tree, and support vector machine) to assess the performance and accuracy of the model. For our deep learning model, we used the long short-term memory unit in the recurrent neural network to learn the pattern of the text of drug dealing posts. We also manually annotated all posts collected to evaluate our model performance and to characterize drug selling conversations. Results From the 12,857 posts we collected, we detected 1228 drug dealer posts comprising 267 unique users. We used cross-validation to evaluate the 4 models, with our deep learning model reaching 95% on F1 score and performing better than the other 3 models. We also found that by removing the hashtags in the text, the model had better performance. Detected posts contained hashtags related to several drugs, including the controlled substance Xanax (1078/1228, 87.78%), oxycodone/OxyContin (321/1228, 26.14%), and illicit drugs lysergic acid diethylamide (213/1228, 17.34%) and 3,4-methylenedioxy-methamphetamine (94/1228, 7.65%). We also observed the use of communication applications for suspected drug trading through user comments. Conclusions Our approach using a combination of Web scraping and deep learning was able to detect illegal online drug sellers on Instagram, with high accuracy. Despite increased scrutiny by regulators and policymakers, the Instagram platform continues to host posts from drug dealers, in violation of federal law. Further action needs to be taken to ensure the safety of social media communities and help put an end to this illicit digital channel of sourcing.

[1]  Chuhan Wu,et al.  Detecting Tweets Mentioning Drug Name and Adverse Drug Reaction with Hierarchical Tweet Representation and Multi-Head Self-Attention , 2018, EMNLP 2018.

[2]  Tim K Mackey,et al.  Opioids and the Internet: Convergence of Technology and Policy to Address the Illicit Online Sales of Opioids , 2018, Health services insights.

[3]  Janani Kalyanam,et al.  Solution to Detect, Classify, and Report Illicit Online Marketing and Sales of Controlled Substances via Twitter: Using Machine Learning and Web Forensics to Combat Digital Opioid Access , 2018, Journal of medical Internet research.

[4]  Gert R. G. Lanckriet,et al.  Twitter-Based Detection of Illegal Online Sale of Prescription Opioid , 2017, American journal of public health.

[5]  T. Mackey,et al.  Detection of illicit online sales of fentanyls via Twitter , 2017, F1000Research.

[6]  R. Raffa,et al.  The “Darknet”: The new street for street drugs , 2017, Journal of clinical pharmacy and therapeutics.

[7]  Stefan M. Rüger,et al.  Adverse Drug Reaction Classification With Deep Neural Networks , 2016, COLING.

[8]  武田 一哉,et al.  Recurrent Neural Networkに基づく日常生活行動認識 , 2016 .

[9]  D. Décary-Hêtu,et al.  Studying illicit drug trafficking on Darknet markets: Structure and organisation from a Canadian perspective. , 2016, Forensic science international.

[10]  Laura J. Bierut,et al.  Marijuana-Related Posts on Instagram , 2016, Prevention Science.

[11]  Jiebo Luo,et al.  Tracking Illicit Drug Dealing and Abuse on Instagram Using Multimodal Analysis , 2016, ACM Trans. Intell. Syst. Technol..

[12]  Jiebo Luo,et al.  Understanding Illicit Drug Use Behaviors by Mining Social Media , 2016, ArXiv.

[13]  Tim Ken Mackey,et al.  Establishing a Link Between Prescription Drug Abuse and Illicit Online Pharmacies: Analysis of Twitter Data , 2015, Journal of medical Internet research.

[14]  John R. Hershey,et al.  Speech enhancement and recognition using multi-task learning of long short-term memory recurrent neural networks , 2015, INTERSPEECH.

[15]  Charles Preuss,et al.  Drug Enforcement Administration (DEA) , 2015 .

[16]  Björn W. Schuller,et al.  Speech Enhancement with LSTM Recurrent Neural Networks and its Application to Noise-Robust ASR , 2015, LVA/ICA.

[17]  Melissa J. Krauss,et al.  Twitter chatter about marijuana. , 2015, The Journal of adolescent health : official publication of the Society for Adolescent Medicine.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Scott H. Burton,et al.  An Exploration of Social Circles and Prescription Drug Abuse Through Twitter , 2013, Journal of medical Internet research.

[20]  John S Brownstein,et al.  Crowdsourcing Black Market Prices For Prescription Opioids , 2013, Journal of medical Internet research.

[21]  Tim K Mackey,et al.  Digital Social Media, Youth, and Nonmedical Use of Prescription Drugs: The Need for Reform , 2013, Journal of medical Internet research.

[22]  Michael D. Barnes,et al.  Tweaking and Tweeting: Exploring Twitter for Nonmedical Use of a Psychostimulant Drug (Adderall) Among College Students , 2013, Journal of medical Internet research.

[23]  Yunming Ye,et al.  An Improved Random Forest Classifier for Text Categorization , 2012, J. Comput..

[24]  Umberto Gelatti,et al.  Quality of Online Pharmacies and Websites Selling Prescription Drugs: A Systematic Review , 2011, Journal of medical Internet research.

[25]  Lipo Wang,et al.  Support Vector Machines: Theory and Applications (Studies in Fuzziness and Soft Computing) , 2005 .

[26]  L. Breiman Random Forests , 2001, Machine Learning.

[27]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[28]  Horacio Rodríguez,et al.  Part-of-Speech Tagging Using Decision Trees , 1998, ECML.

[29]  A. K. Rigler,et al.  Accelerating the convergence of the back-propagation method , 1988, Biological Cybernetics.

[30]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[31]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[32]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[33]  Shie Mannor,et al.  A Tutorial on the Cross-Entropy Method , 2005, Ann. Oper. Res..

[34]  Lipo Wang Support vector machines : theory and applications , 2005 .