Employing online social networks in precision-medicine approach using information fusion predictive model to improve substance use surveillance: A lesson from Twitter and marijuana consumption

Abstract The impact that connected community have on precision health or medicine and vice versa offers opportunities for any type of research and survey information, e. g, to predict trends in health-related issues, specifically people behavior towards drug use. Here, precision medicine influences the way to treat the information and get a better outcome to support the stakeholder decision. Online social networks analysis seems to be good tools to quickly monitor the population behavior where users freely share large amounts of information related to their own lives on day- to- day basis. This novel kind of data can be used to get additional real time insights from people to understand their actual behavior related to drug use (Cortes et al., 2017). The aim of this research is to generated an information fusion model of marijuana use tendency confident enough to be employed by stakeholders. So, we will: (a) collect and process the data from Twitter; (b) design a set of algorithms to estimate the tendency of marijuana use in relation to age, localization and gender, moreover, used a set of processes and activities to verify if our model were performing as expected; and (c) fusion of the information in a model to fully characterize the marijuana use population comparable to the national marijuana consume survey for policy makers utilization to improve drug use prevention. First,we collect the data from Twitter accounts based in Chile using an algorithm for traversing graph data structures, we collected the data from Twitter accounts based in Chile. Then, we estimated marijuana user prevalence during a period from 2006 to 2018 and, within each of the years we predicted the prevalence of user population in relation with age (in range), the localization (regions) and gender. Finally, we built indicators to explore the similarity between data obtained through Twitter (our results) and the actual data collected by the National Service for the Prevention and Rehabilitation of Drug and Alcohol (SENDA) under the same variables analyze in their own survey. When we compare the results of the algorithms and methods developed by us with those provided by the SENDA, we observed that most of the indicators present similar trends, i.e., the variation of the prevalence by years in the age, location and gender, showed similar changes in both analyzes. Also, the algorithms effectiveness and capacity to predict variations of complex cases like marijuana use in Chilean population. This study is a key opportunity to obtain in a faster, low cost and continuous way information about marijuana use, also, is an excellent tool for marijuana surveillance to get information to support policy makers and stakeholder decisions.

[1]  Amit P. Sheth,et al.  Predictive Analysis on Twitter: Techniques and Applications , 2018 .

[2]  Marc A Zimmerman,et al.  Permissive norms and young adults' alcohol and marijuana use: the role of online communities. , 2012, Journal of studies on alcohol and drugs.

[3]  Yen S. Low,et al.  Text Mining for Adverse Drug Events: the Promise, Challenges, and State of the Art , 2014, Drug Safety.

[4]  Lena Shah,et al.  Incorporating geographic settings into a social network analysis of injection drug use and bloodborne pathogen prevalence. , 2007, Health & place.

[5]  G. Eysenbach Infodemiology and Infoveillance: Framework for an Emerging Set of Public Health Informatics Methods to Analyze Search, Communication and Publication Behavior on the Internet , 2009, Journal of medical Internet research.

[6]  Ting Wang,et al.  Who will retweet me?: finding retweeters in twitter , 2013, SIGIR.

[7]  Clayton M. Christensen Innovation and the general manager , 1999 .

[8]  Nicholas Genes,et al.  Leveraging Social Networks for Toxicovigilance , 2013, Journal of Medical Toxicology.

[9]  Daniel J. Brass,et al.  Network Analysis in the Social Sciences , 2009, Science.

[10]  Francois R. Lamy,et al.  "Time for dabs": Analyzing Twitter data on marijuana concentrates across the U.S. , 2015, Drug and alcohol dependence.

[11]  Juan D. Velásquez,et al.  Web mining and privacy concerns: Some important legal issues to be consider before applying any data and information extraction technique in web-based environments , 2013, Expert Syst. Appl..

[12]  U. Ghitza ASPIRE Model for Treating Cannabis and Other Substance Use Disorders: A Novel Personalized-Medicine Framework , 2014, Front. Psychiatry.

[13]  Melissa J. Krauss,et al.  Twitter chatter about marijuana. , 2015, The Journal of adolescent health : official publication of the Society for Adolescent Medicine.

[14]  Mark Dredze,et al.  Could behavioral medicine lead the web data revolution? , 2014, JAMA.

[15]  Cuixian Chen,et al.  Face age estimation using model selection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[16]  Melissa J. Krauss,et al.  A content analysis of tweets about high-potency marijuana. , 2016, Drug and alcohol dependence.

[17]  Huan Liu,et al.  Your Age Is No Secret: Inferring Microbloggers' Ages via Content and Interaction Analysis , 2016, ICWSM.

[18]  Juan D. Velásquez,et al.  Twitter for marijuana infodemiology , 2017, WI.

[19]  Alyson G. Wilson,et al.  Twitter Geolocation , 2018, ACM Trans. Knowl. Discov. Data.

[20]  Wen Li,et al.  Gender Prediction for Chinese Social Media Data , 2017, RANLP.

[21]  Robert F. Chew,et al.  Predicting age groups of Twitter users based on language and metadata features , 2017, PloS one.

[22]  D. Henry,et al.  Interplay of Network Position and Peer Substance Use in Early Adolescent Cigarette, Alcohol, and Marijuana Use , 2010 .

[23]  Vijay V. Raghavan,et al.  Detecting adverse drug effects using link classification on twitter data , 2015, 2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[24]  Aixin Sun,et al.  A Survey of Location Prediction on Twitter , 2017, IEEE Transactions on Knowledge and Data Engineering.

[25]  A. Pedrana,et al.  Making the most of a brave new world: opportunities and considerations for using Twitter as a public health monitoring tool. , 2014, Preventive medicine.

[26]  Amit P. Sheth,et al.  PREDOSE: A semantic web platform for drug abuse epidemiology using social media , 2013, J. Biomed. Informatics.

[27]  Karl E. Bauman,et al.  The Peer Context of Adolescent Substance Use: Findings from Social Network Analysis , 2006 .

[28]  Soumya Banerjee,et al.  Web Opinion Mining and Sentimental Analysis , 2013 .

[29]  John D. Burger,et al.  Discriminating Gender on Twitter , 2011, EMNLP.

[30]  Michal Karpowicz,et al.  Reprint of: Computational approaches for mining user's opinions on the Web 2.0 , 2015, Inf. Process. Manag..

[31]  Priya Anand,et al.  Focused web crawlers and its approaches , 2015, 2015 International Conference on Futuristic Trends on Computational Analysis and Knowledge Management (ABLAZE).

[32]  Dong Nguyen,et al.  "How Old Do You Think I Am?" A Study of Language and Age in Twitter , 2013, ICWSM.

[33]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[34]  José Carlos González Cristóbal,et al.  TASS - Workshop on Sentiment Analysis at SEPLN , 2013 .

[35]  Timothy Baldwin,et al.  Text-Based Twitter User Geolocation Prediction , 2014, J. Artif. Intell. Res..

[36]  Michal Karpowicz,et al.  Opinion Mining on the Web 2.0 - Characteristics of User Generated Content and Their Impacts , 2013, CHI-KDD.

[37]  Carolyn Penstein Rosé,et al.  Author Age Prediction from Text using Linear Regression , 2011, LaTeCH@ACL.

[38]  Hongying Dai,et al.  Mining social media data for opinion polarities about electronic cigarettes , 2016, Tobacco Control.

[39]  Jorge A. Balazs,et al.  Opinion Mining and Information Fusion: A survey , 2016, Inf. Fusion.

[40]  Udi E. Ghitza Needed Relapse-Prevention Research on Novel Framework (ASPIRE Model) for Substance Use Disorders Treatment , 2015, Front. Psychiatry.

[41]  Barry Wellman,et al.  Social Network Analysis: An Introduction 1 , 2010 .

[42]  Juan D. Velásquez,et al.  Web site keywords: A methodology for improving gradually the web site text content , 2012, Intell. Data Anal..

[43]  Gunther Eysenbach,et al.  Infodemiology and infoveillance tracking online health information and cyberbehavior for public health. , 2011, American journal of preventive medicine.

[44]  Krishnaprasad Thirunarayan,et al.  “When ‘Bad’ is ‘Good’”: Identifying Personal Communication and Sentiment in Drug-Related Tweets , 2016, JMIR public health and surveillance.

[45]  Yoonsang Kim,et al.  Garbage in, Garbage Out: Data Collection, Quality Assessment and Reporting Standards for Social Media Data Use in Health Research, Infodemiology and Digital Disease Detection , 2016, Journal of medical Internet research.

[46]  J. Bauermeister,et al.  Online Network Influences on Emerging Adults’ Alcohol and Drug Use , 2013, Journal of youth and adolescence.

[47]  Stephan M. Winkler,et al.  On Text Preprocessing for Opinion Mining Outside of Laboratory Environments , 2012, AMT.

[48]  Ian Portelli,et al.  Drug Use in the Twittersphere: A Qualitative Contextual Analysis of Tweets About Prescription Drugs , 2015, Journal of addictive diseases.

[49]  D. Ruths,et al.  What's in a Name? Using First Names as Features for Gender Inference in Twitter , 2013, AAAI Spring Symposium: Analyzing Microtext.

[50]  Mark Dredze,et al.  You Are What You Tweet: Analyzing Twitter for Public Health , 2011, ICWSM.

[51]  Kathryn A Phillips,et al.  Precision Medicine: From Science To Value. , 2018, Health affairs.

[52]  Baoxin Li,et al.  Finding needles of interested tweets in the haystack of Twitter network , 2016, 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[53]  A. Amialchuk,et al.  The Social Contagion Effect of Marijuana Use among Adolescents , 2011, PloS one.

[54]  Zhiyuan Liu,et al.  Discriminating gender on Chinese microblog: A study of online behaviour, writing style and preferred vocabulary , 2014, 2014 10th International Conference on Natural Computation (ICNC).