Mining user-generated content in an online smoking cessation community to identify smoking status: A machine learning approach

Online smoking cessation communities help hundreds of thousands of smokers quit smoking and stay abstinent each year. Content shared by users of such communities may contain important information that could enable more effective and personally tailored cessation treatment recommendations. This study demonstrates a novel approach to determine individuals' smoking status by applying machine learning techniques to classify user-generated content in an online cessation community. Study data were from BecomeAnEX.org, a large, online smoking cessation community. We extracted three types of novel features from a post: domain-specific features, author-based features, and thread-based features. These features helped to improve the smoking status identification (quit vs. not) performance by 9.7% compared to using only text features of a post's content. In other words, knowledge from domain experts, data regarding the post author's patterns of online engagement, and other community member reactions to the post can help to determine the focal post author's smoking status, over and above the actual content of a focal post. We demonstrated that machine learning methods can be applied to user-generated data from online cessation communities to validly and reliably discern important user characteristics, which could aid decision support on intervention tailoring.

[1]  Amanda L. Graham,et al.  Use of an online smoking cessation community promotes abstinence: Results of propensity score weighting. , 2015, Health psychology : official journal of the Division of Health Psychology, American Psychological Association.

[2]  Carolyn Penstein Rosé,et al.  Extracting Events with Informal Temporal References in Personal Histories in Online Communities , 2013, ACL.

[3]  Christopher C. Yang,et al.  Using Health-Consumer-Contributed Data to Detect Adverse Drug Reactions by Association Mining with Temporal Analysis , 2015, ACM Trans. Intell. Syst. Technol..

[4]  Eric Horvitz,et al.  Predicting Depression via Social Media , 2013, ICWSM.

[5]  Simon Chapman,et al.  Why do smokers try to quit without medication or counselling? A qualitative study with ex-smokers , 2015, BMJ Open.

[7]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[8]  Ron Borland,et al.  Patterns of Use of an Automated Interactive Personalized Coaching Program for Smoking Cessation , 2008, Journal of medical Internet research.

[9]  Cornelia Caragea,et al.  Thread Specific Features are Helpful for Identifying Subjectivity Orientation of Online Forum Threads , 2012, COLING.

[10]  John Yen,et al.  Get Online Support, Feel Better -- Sentiment Analysis and Dynamics in an Online Cancer Survivor Community , 2011, 2011 IEEE Third Int'l Conference on Privacy, Security, Risk and Trust and 2011 IEEE Third Int'l Conference on Social Computing.

[11]  A. Sheikh,et al.  Internet-based interventions for smoking cessation. , 2010, The Cochrane database of systematic reviews.

[12]  S. Kelders,et al.  Persuasive System Design Does Matter: A Systematic Review of Adherence to Web-Based Interventions , 2012, Journal of medical Internet research.

[13]  Aziz Sheikh,et al.  Internet-based interventions for smoking cessation. , 2017, The Cochrane database of systematic reviews.

[14]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[15]  Trevor Cohen,et al.  Finding Meaning in Social Media: Content-based Social Network Analysis of QuitNet to Identify New Opportunities for Health Promotion , 2013, MedInfo.

[16]  L. Stead,et al.  Self-help interventions for smoking cessation. , 2005, The Cochrane database of systematic reviews.

[17]  Lucy Yardley,et al.  Optimizing engagement with Internet-based health behaviour change interventions: Comparison of self-assessment with and without tailored feedback using a mixed methods approach , 2013, British journal of health psychology.

[18]  J. Prochaska,et al.  A meta-analysis of computer-tailored interventions for health behavior change. , 2010, Preventive medicine.

[19]  Yajiong Xue,et al.  Web-based intervention support system for health promotion , 2006, Decis. Support Syst..

[20]  K. Zhao,et al.  Analyzing and Predicting User Participations in Online Health Communities: A Social Support Perspective , 2017, Journal of medical Internet research.

[21]  Kang Zhao,et al.  PREDICTING USER ENGAGEMENT IN ONLINE HEALTH COMMUNITIES BASED ON SOCIAL SUPPORT ACTIVITIES , 2014 .

[22]  George Reynolds,et al.  Development of a personalized bidirectional text messaging tool for HIV adherence assessment and intervention among substance abusers. , 2014, Journal of substance abuse treatment.

[23]  Seth M Noar,et al.  Efficacy of text messaging-based interventions for health promotion: a meta-analysis. , 2013, Social science & medicine.

[24]  Trevor van Mierlo,et al.  Superusers in Social Networks for Smoking Cessation: Analysis of Demographic Characteristics and Posting Behavior From the Canadian Cancer Society's Smokers' Helpline Online and StopSmokingCenter.net , 2012, Journal of medical Internet research.

[25]  John Yen,et al.  Leader identification in an online health community for cancer survivors: a social network-based classification approach , 2014, Information Systems and e-Business Management.

[26]  Susan J. Bondy,et al.  “Hike up yer Skirt, and Quit.” What Motivates and Supports Smoking Cessation in Builders and Renovators , 2013, International journal of environmental research and public health.

[27]  Adler Perotte,et al.  Social Network Behavior and Engagement Within a Smoking Cessation Facebook Page , 2016, Journal of medical Internet research.

[28]  Christopher C. Yang,et al.  Social media mining for drug safety signal detection , 2012, SHB '12.

[29]  Sulin Ba,et al.  Digital health communities: The effect of their motivation mechanisms , 2013, Decis. Support Syst..

[30]  Xiaowu Sun,et al.  Demographic variables, smoking variables, and outcome across five studies. , 2007, Health psychology : official journal of the Division of Health Psychology, American Psychological Association.

[31]  W. Nilsen,et al.  Health behavior models in the age of mobile interventions: are our theories up to the task? , 2011, Translational behavioral medicine.

[32]  Bruce Neal,et al.  A Systematic Review of the Impact of Adherence on the Effectiveness of e-Therapies , 2011, Journal of medical Internet research.

[33]  Scott T. Weiss,et al.  Extracting principal diagnosis, co-morbidity and smoking status for asthma research: evaluation of a natural language processing system , 2006, BMC Medical Informatics Decis. Mak..

[34]  H. Stuckey,et al.  Chronic Health Conditions and Internet Behavioral Interventions: A Review of Factors to Enhance User Engagement , 2011, Computers, informatics, nursing : CIN.

[35]  John Yen,et al.  Finding influential users of online health communities: a new metric based on sentiment influence. , 2014, Journal of the American Medical Informatics Association : JAMIA.

[36]  Vincent Baujard,et al.  A qualitative analysis of an internet discussion forum for recent ex-smokers. , 2006, Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco.

[37]  D. Dolinski,et al.  Social Influence , 2007 .

[38]  John Yearwood,et al.  Kernel-based features for predicting population health indices from geocoded social media data , 2017, Decis. Support Syst..

[39]  Robyn L Richmond,et al.  Measures of abstinence in clinical trials: issues and recommendations. , 2003, Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco.

[40]  F. Bull,et al.  Are tailored health education materials always more effective than non-tailored materials? , 2000, Health education research.

[41]  J. Cacioppo,et al.  Central and Peripheral Routes to Advertising Effectiveness: The Moderating Role of Involvement , 1983 .

[42]  Steven S Fu,et al.  Views on smoking cessation methods in ethnic minority communities: a qualitative investigation. , 2007, Preventive medicine.

[43]  Mi Zhang,et al.  Social Media Analytics of Smoking Cessation Intervention: User Behavior Analysis, Classification, and Prediction , 2015 .

[44]  Sarah L Cutrona,et al.  Collective-Intelligence Recommender Systems: Advancing Computer Tailoring for Health Behavior Change Into the 21st Century , 2016, Journal of medical Internet research.

[45]  Amanda L. Graham,et al.  Online community use predicts abstinence in combined Internet/phone intervention for smoking cessation. , 2016, Journal of consulting and clinical psychology.

[46]  Cornelia Caragea,et al.  I want what i need!: analyzing subjectivity of online forum threads , 2012, CIKM.

[47]  Aaron M. Cohen,et al.  Case Report: Five-way Smoking Status Classification Using Text Hot-Spot Identification and Error-correcting Output Codes , 2008, J. Am. Medical Informatics Assoc..

[48]  Ana-Maria Popescu,et al.  A Machine Learning Approach to Twitter User Classification , 2011, ICWSM.

[49]  Rajani Shankar Sadasivam,et al.  Impact of a Collective Intelligence Tailored Messaging System on Smoking Cessation: The Perspect Randomized Experiment , 2016, Journal of medical Internet research.

[50]  Daniel Parent,et al.  Online Social and Professional Support for Smokers Trying to Quit: An Exploration of First Time Posts From 2562 Members , 2010, Journal of medical Internet research.

[51]  Nathan K. Cobb,et al.  Sentiment analysis to determine the impact of online messages on smokers' choices to use varenicline. , 2013, Journal of the National Cancer Institute. Monographs.

[52]  Frances Kay-Lambkin,et al.  Social influence, addictions and the Internet: the potential of Web 2.0 technologies in enhancing treatment for alcohol/other drug use problems , 2012 .

[53]  Mark Dredze,et al.  Separating Fact from Fear: Tracking Flu Infections on Twitter , 2013, NAACL.

[54]  Sarah M. Greene,et al.  The Role of Engagement in a Tailored Web-Based Smoking Cessation Program: Randomized Controlled Trial , 2008, Journal of medical Internet research.

[55]  Chia-Yi Wu,et al.  Evaluation of Smoking Status Identification Using Electronic Health Records and Open-Text Information in a Large Mental Health Case Register , 2013, PloS one.

[56]  Jennifer L Pearson,et al.  A Multirelational Social Network Analysis of an Online Health Community for Smoking Cessation , 2016, Journal of medical Internet research.

[57]  Felix Naughton,et al.  Understanding Pregnant Smokers' Adherence to Nicotine Replacement Therapy During a Quit Attempt: A Qualitative Study. , 2016, Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco.

[58]  John Yearwood,et al.  Discriminative Cues for Different Stages of Smoking Cessation in Online Community , 2016, WISE.

[59]  J. Seeley,et al.  Methodological Issues in Research on Web-Based Behavioral Interventions , 2009, Annals of behavioral medicine : a publication of the Society of Behavioral Medicine.

[60]  Yuan Luo,et al.  Identifying patient smoking status from medical discharge records. , 2008, Journal of the American Medical Informatics Association : JAMIA.

[61]  ChengXiang Zhai,et al.  Text Data Management and Analysis: A Practical Introduction to Information Retrieval and Text Mining , 2016 .

[62]  Munmun De Choudhury,et al.  Characterizing Smoking and Drinking Abstinence from Social Media , 2015, HT.

[63]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[64]  S. Noar,et al.  Does tailoring matter? Meta-analytic review of tailored print health behavior change interventions. , 2007, Psychological bulletin.

[65]  Gareth J Hollands,et al.  Interventions to increase adherence to medications for tobacco dependence. , 2015, The Cochrane database of systematic reviews.

[66]  Peter Dalum,et al.  “After all – It doesn’t kill you to quit smoking”: An explorative analysis of the blog in a smoking cessation intervention , 2013, Scandinavian journal of public health.

[67]  Jennifer L Pearson,et al.  Inferring Smoking Status from User Generated Content in an Online Cessation Community , 2019, Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco.

[68]  P. Stavri,et al.  Consumer Health Vocabulary , 2005 .

[69]  Michael S. Amato,et al.  Twelve Million Smokers Look Online for Smoking Cessation Help Annually: Health Information National Trends Survey Data, 2005–2017 , 2018, Nicotine & tobacco research : official journal of the Society for Research on Nicotine and Tobacco.

[70]  Christopher C. Yang,et al.  Social Support and Exchange Patterns in an Online Smoking Cessation Intervention Program , 2013, 2013 IEEE International Conference on Healthcare Informatics.

[71]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[72]  W. Buchholz The Ωμ+1-Rule , 1981 .

[73]  Celette Sugg Skinner,et al.  How effective is tailored print communication? , 1999, Annals of behavioral medicine : a publication of the Society of Behavioral Medicine.

[74]  Roxana Girju,et al.  Identifying Medications that Patients Stopped Taking in Online Health Forums , 2017, 2017 IEEE 11th International Conference on Semantic Computing (ICSC).

[75]  Matthew R. Sydes,et al.  Technical Brief: Using Implicit Information to Identify Smoking Status in Smoke-blind Medical Discharge Summaries , 2008, J. Am. Medical Informatics Assoc..