Utilizing Social Media to Combat Opioid Addiction Epidemic: Automatic Detection of Opioid Users from Twitter

Opioid (e.g., heroin and morphine) addiction has become one of the largest and deadliest epidemics in the United States. To combat such deadly epidemic, in this paper, we propose a novel framework named AutoOPU to automatically detect the opioid users from Twitter, which will assist in sharpen- ing our understanding toward the behavioral process of opioid addiction and treatment. In AutoOPU , to model the users and posted tweets as well as their rich relationships, we first introduce a heterogeneous information network (HIN) for representation. Then we use meta-structure based approach to characterize the semantic relatedness over users. After- wards, we integrate content-based similarity and relatedness depicted by each meta-structure to formulate a similarity measure over users. Further, we aggregate different similarities using multi-kernel learning, each of which is automati- cally weighted by the learning algorithm to make predictions. To the best of our knowledge, this is the first work to use multi-kernel learning based on meta-structures over HIN for biomedical knowledge mining, especially in drug-addiction domain. Comprehensive experiments on real sample collections from Twitter are conducted to validate the effectiveness of our developed system AutoOPU in opioid user detection by comparisons with other alternative methods.

[1]  Yanfang Ye,et al.  HinDroid: An Intelligent Android Malware Detection System Based on Structured Heterogeneous Information Network , 2017, KDD.

[2]  Philip S. Yu,et al.  A Survey of Heterogeneous Information Network Analysis , 2015, IEEE Transactions on Knowledge and Data Engineering.

[3]  V. Murthy,et al.  Ending the Opioid Epidemic - A Call to Action. , 2016, The New England journal of medicine.

[4]  Xiang Li,et al.  Meta Structure: Computing Relevance in Large Heterogeneous Information Networks , 2016, KDD.

[5]  Jiawei Han,et al.  Text Classification with Heterogeneous Information Network Kernels , 2016, AAAI.

[6]  Jiawei Han,et al.  KnowSim: A Document Similarity Measure on Structured Heterogeneous Information Networks , 2015, 2015 IEEE International Conference on Data Mining.

[7]  Abeed Sarker,et al.  Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features , 2015, J. Am. Medical Informatics Assoc..

[8]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[9]  Chen Luo,et al.  HetPathMine: A Novel Transductive Classification Algorithm on Heterogeneous Information Networks , 2014, ECIR.

[10]  Amit P. Sheth,et al.  PREDOSE: A semantic web platform for drug abuse epidemiology using social media , 2013, J. Biomed. Informatics.

[11]  Fan Yu,et al.  Towards large-scale twitter mining for drug-related adverse events , 2012, SHB '12.

[12]  Yizhou Sun,et al.  Mining Heterogeneous Information Networks: Principles and Methodologies , 2012, Mining Heterogeneous Information Networks: Principles and Methodologies.

[13]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[14]  Charu C. Aggarwal,et al.  Co-author Relationship Prediction in Heterogeneous Bibliographic Networks , 2011, 2011 International Conference on Advances in Social Networks Analysis and Mining.

[15]  Ethem Alpaydin,et al.  Multiple Kernel Learning Algorithms , 2011, J. Mach. Learn. Res..

[16]  S. V. N. Vishwanathan,et al.  Multiple Kernel Learning and the SMO Algorithm , 2010, NIPS.

[17]  Philip S. Yu,et al.  Mining knowledge from databases: an information network analysis approach , 2010, SIGMOD Conference.

[18]  Chong-Wah Ngo,et al.  Evaluating bag-of-visual-words representations in scene classification , 2007, MIR '07.

[19]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[20]  Deng Cai,et al.  Laplacian Score for Feature Selection , 2005, NIPS.

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Pierre Baldi,et al.  A Bayesian framework for the analysis of microarray expression data: regularized t -test and statistical inferences of gene changes , 2001, Bioinform..

[23]  C. Ballantine On the Hadamard product , 1968 .