Automatic Opioid User Detection from Twitter: Transductive Ensemble Built on Different Meta-graph Based Similarities over Heterogeneous Information Network

Opioid (e.g., heroin and morphine) addiction has become one of the largest and deadliest epidemics in the United States. To combat such deadly epidemic, in this paper, we propose a novel framework named HinOPU to automatically detect opioid users from Twitter, which will assist in sharpening our understanding toward the behavioral process of opioid addiction and treatment. In HinOPU, to model the users and the posted tweets as well as their rich relationships, we introduce structured heterogeneous information network (HIN) for representation. Afterwards, we use meta-graph based approach to characterize the semantic relatedness over users; we then formulate different similarities over users based on different meta-graphs on HIN. To reduce the cost of acquiring labeled samples for supervised learning, we propose a transductive classification method to build the base classifiers based on different similarities formulated by different meta-graphs. Then, to further improve the detection accuracy, we construct an ensemble to combine different predictions from different base classifiers for opioid user detection. Comprehensive experiments on real sample collections from Twitter are conducted to validate the effectiveness of HinOPU in opioid user detection by comparisons with other alternate methods.

[1]  B. Saloner,et al.  Changes in Substance Abuse Treatment Use Among Individuals With Opioid Use Disorders in the United States, 2004-2013. , 2015, JAMA.

[2]  Abeed Sarker,et al.  Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features , 2015, J. Am. Medical Informatics Assoc..

[3]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[4]  Fan Yu,et al.  Towards large-scale twitter mining for drug-related adverse events , 2012, SHB '12.

[5]  Xiang Li,et al.  On Transductive Classification in Heterogeneous Information Networks , 2016, CIKM.

[6]  Min Zhao,et al.  SBMDS: an interpretable string based malware detection system using SVM ensemble with bagging , 2009, Journal in Computer Virology.

[7]  Lise Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[8]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[9]  Aravaipa Canyon Basin,et al.  Volume 3 , 2012, Journal of Diabetes Investigation.

[10]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[11]  Yanfang Ye,et al.  HinDroid: An Intelligent Android Malware Detection System Based on Structured Heterogeneous Information Network , 2017, KDD.

[12]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[13]  A T McLellan,et al.  Drug dependence, a chronic medical illness: implications for treatment, insurance, and outcomes evaluation. , 2000, JAMA.

[14]  共立出版株式会社 コンピュータ・サイエンス : ACM computing surveys , 1978 .

[15]  C. Hawn Take two aspirin and tweet me in the morning: how Twitter, Facebook, and other social media are reshaping health care. , 2009, Health affairs.

[16]  Yanfang Ye,et al.  Intelligent file scoring system for malware detection from the gray list , 2009, KDD.

[17]  Xin Li,et al.  Social Media for Opioid Addiction Epidemiology: Automatic Detection of Opioid Addicts from Twitter and Case Studies , 2017, CIKM.

[18]  Kevin A Clauson,et al.  Pharmacist use of social media , 2011, The International journal of pharmacy practice.

[19]  Volume 16 , 2004, Journal of Clinical Monitoring and Computing.

[20]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[21]  Hwee Tou Ng,et al.  It Makes Sense: A Wide-Coverage Word Sense Disambiguation System for Free Text , 2010, ACL.

[22]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[23]  L. Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[24]  Dik Lun Lee,et al.  Meta-Graph Based Recommendation Fusion over Heterogeneous Information Networks , 2017, KDD.

[25]  Yizhou Sun,et al.  Graph Regularized Transductive Classification on Heterogeneous Information Networks , 2010, ECML/PKDD.

[26]  S. Sitharama Iyengar,et al.  A Survey on Malware Detection Using Data Mining Techniques , 2017, ACM Comput. Surv..

[27]  Yanfang Ye,et al.  Combining file content and file relations for cloud based malware detection , 2011, KDD.

[28]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..