Detecting Recovery Problems Just in Time: Application of Automated Linguistic Analysis and Supervised Machine Learning to an Online Substance Abuse Forum

Background Online discussion forums allow those in addiction recovery to seek help through text-based messages, including when facing triggers to drink or use drugs. Trained staff (or “moderators”) may participate within these forums to offer guidance and support when participants are struggling but must expend considerable effort to continually review new content. Demands on moderators limit the scalability of evidence-based digital health interventions. Objective Automated identification of recovery problems could allow moderators to engage in more timely and efficient ways with participants who are struggling. This paper aimed to investigate whether computational linguistics and supervised machine learning can be applied to successfully flag, in real time, those discussion forum messages that moderators find most concerning. Methods Training data came from a trial of a mobile phone-based health intervention for individuals in recovery from alcohol use disorder, with human coders labeling discussion forum messages according to whether or not authors mentioned problems in their recovery process. Linguistic features of these messages were extracted via several computational techniques: (1) a Bag-of-Words approach, (2) the dictionary-based Linguistic Inquiry and Word Count program, and (3) a hybrid approach combining the most important features from both Bag-of-Words and Linguistic Inquiry and Word Count. These features were applied within binary classifiers leveraging several methods of supervised machine learning: support vector machines, decision trees, and boosted decision trees. Classifiers were evaluated in data from a later deployment of the recovery support intervention. Results To distinguish recovery problem disclosures, the Bag-of-Words approach relied on domain-specific language, including words explicitly linked to substance use and mental health (“drink,” “relapse,” “depression,” and so on), whereas the Linguistic Inquiry and Word Count approach relied on language characteristics such as tone, affect, insight, and presence of quantifiers and time references, as well as pronouns. A boosted decision tree classifier, utilizing features from both Bag-of-Words and Linguistic Inquiry and Word Count performed best in identifying problems disclosed within the discussion forum, achieving 88% sensitivity and 82% specificity in a separate cohort of patients in recovery. Conclusions Differences in language use can distinguish messages disclosing recovery problems from other message types. Incorporating machine learning models based on language use allows real-time flagging of concerning content such that trained staff may engage more efficiently and focus their attention on time-sensitive issues.

[1]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[2]  Johannes Zimmermann,et al.  The way we refer to ourselves reflects how we relate to others: Associations between first-person pronoun use and interpersonal problems , 2013 .

[3]  P. Klemm Effects of Online Support Group Format (Moderated vs Peer-Led) on Depressive Symptoms and Extent of Participation in Women With Breast Cancer , 2012, Computers, informatics, nursing : CIN.

[4]  Dhavan V. Shah,et al.  Implementing a Mobile Health System to Integrate the Treatment of Addiction Into Primary Care: A Hybrid Implementation-Effectiveness Study , 2018, Journal of medical Internet research.

[5]  J. Powell,et al.  Primary care Health related virtual communities and electronic support groups : systematic review of the effects of online peer to peer interactions , 2004 .

[6]  Dhavan V. Shah,et al.  What Do You Say Before You Relapse? How Language Use in a Peer-to-peer Online Discussion Forum Predicts Risky Drinking among Those in Recovery , 2018, Health communication.

[7]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[8]  H. Baumeister,et al.  The impact of guidance on Internet-based mental health interventions — A systematic review , 2014 .

[9]  Jina Huh,et al.  Tackling dilemmas in supporting 'the whole person' in online patient communities , 2012, CHI.

[10]  C. Apte,et al.  Data mining with decision trees and decision rules , 1997, Future Gener. Comput. Syst..

[11]  Mark Dredze,et al.  Discovering Shifts to Suicidal Ideation from Mental Health Content in Social Media , 2016, CHI.

[12]  K. Humphreys,et al.  Communication patterns in an on‐line mutual help group for problem drinkers , 2000 .

[13]  Jean-Michel Poggi,et al.  Variable selection using random forests , 2010, Pattern Recognit. Lett..

[14]  Scott W. Campbell,et al.  Mobile phone use among Alcoholics Anonymous members: new sites for recovery , 2008, New Media Soc..

[15]  B. Hanusa,et al.  Development of Telehealth Dialogues for Monitoring Suicidal Patients with Schizophrenia: Consumer Feedback , 2014, Community Mental Health Journal.

[16]  Munmun De Choudhury,et al.  Detecting and Characterizing Mental Health Related Self-Disclosure in Social Media , 2015, CHI Extended Abstracts.

[17]  Sean T. Green,et al.  Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards , 2011, Population health metrics.

[18]  Christopher C. Yang,et al.  Informational support exchanges using different computer‐mediated communication formats in a social media alcoholism community , 2014, J. Assoc. Inf. Sci. Technol..

[19]  Rafael A. Calvo,et al.  CLPsych 2016 Shared Task: Triaging content in online peer-support forums , 2016, CLPsych@HLT-NAACL.

[20]  Chandrika Kamath,et al.  Feature selection in scientific applications , 2004, KDD.

[21]  S. Murphy,et al.  Developing adaptive treatment strategies in substance abuse research. , 2007, Drug and alcohol dependence.

[22]  T. Vos,et al.  Global burden of disease attributable to mental and substance use disorders: findings from the Global Burden of Disease Study 2010 , 2013, The Lancet.

[23]  David C. DeAndrea,et al.  Online peer support for mental health problems in the United States: 2004–2010 , 2013, Psychological Medicine.

[24]  P. Arnstein,et al.  From chronic pain patient to peer: benefits and risks of volunteering. , 2002, Pain management nursing : official journal of the American Society of Pain Management Nurses.

[25]  Adam N. Joinson,et al.  Linguistic Markers of Secrets and Sensitive Self-Disclosure in Twitter , 2012, 2012 45th Hawaii International Conference on System Sciences.

[26]  Eric Horvitz,et al.  Predicting postpartum changes in emotion and behavior via social media , 2013, CHI.

[27]  Hongfang Liu,et al.  Retrieval of Semantically Similar Healthcare Questions in Healthcare Forums , 2015, 2015 International Conference on Healthcare Informatics.

[28]  Colleen Richey,et al.  Aided diagnosis of dementia type through computer-based analysis of spontaneous speech , 2014, CLPsych@ACL.

[29]  Patricia Michelle Troxell Klingenbjerg Smartphone-Based Conversational Agents and Responses to Questions about Mental Health, Interpersonal Violence, and Physical Health , 2016 .

[30]  Elaine Greidanus,et al.  Helper therapy in an online suicide prevention community , 2010 .

[31]  Golan Shahar,et al.  Responses to suicidal messages in an online support group: comparison between trained volunteers and lay individuals , 2012, Social Psychiatry and Psychiatric Epidemiology.

[32]  Geoff Sutcliffe,et al.  A Computational Future for Preventing HIV in Minority Communities: How Advanced Technology Can Improve Implementation of Effective Programs , 2013, Journal of acquired immune deficiency syndromes.

[33]  W. Pratt,et al.  Managing the Personal Side of Health: How Patient Expertise Differs from the Expertise of Clinicians , 2011, Journal of medical Internet research.

[34]  Mizuki Morita,et al.  Twitter Catches The Flu: Detecting Influenza Epidemics using Twitter , 2011, EMNLP.

[35]  A. Barak,et al.  A Comprehensive Review and a Meta-Analysis of the Effectiveness of Internet-Based Psychotherapeutic Interventions , 2008 .

[36]  J. Gross,et al.  Self-representation in social anxiety disorder: linguistic analysis of autobiographical narratives. , 2008, Behaviour research and therapy.

[37]  Dhavan V. Shah,et al.  A smartphone application to support recovery from alcoholism: a randomized clinical trial. , 2014, JAMA psychiatry.

[38]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[39]  Toni Giorgino,et al.  Special issue on dialog systems for health communication , 2006, J. Biomed. Informatics.

[40]  Misha Pavel,et al.  Building new computational models to support health behavior change and maintenance: new opportunities in behavioral research , 2015, Translational behavioral medicine.

[41]  Munmun De Choudhury,et al.  Mental Health Discourse on reddit: Self-Disclosure, Social Support, and Anonymity , 2014, ICWSM.

[42]  Cindy K. Chung,et al.  The Psychological Functions of Function Words , 2007 .

[43]  Christopher C. Yang,et al.  Interaction Patterns of Nurturant Support Exchanged in Online Health Social Networking , 2012, Journal of medical Internet research.

[44]  Trevor van Mierlo,et al.  An online support group for problem drinkers: AlcoholHelpCenter.net. , 2008, Patient education and counseling.

[45]  Robert E. Kraut,et al.  Modeling Self-Disclosure in Social Networking Sites , 2016, CSCW.

[46]  Azy Barak,et al.  Current and Future Trends in Internet-Supported Mental Health Interventions , 2011 .

[47]  Jürgen Rehm,et al.  Global burden of disease and injury and economic cost attributable to alcohol use and alcohol-use disorders , 2009, The Lancet.

[48]  P. Cuijpers,et al.  Supportive Accountability: A Model for Providing Human Support to Enhance Adherence to eHealth Interventions , 2011, Journal of medical Internet research.

[49]  K. Witkiewitz Predictors of heavy drinking during and following treatment. , 2011, Psychology of addictive behaviors : journal of the Society of Psychologists in Addictive Behaviors.

[50]  M. Keshavan,et al.  Patient Smartphone Ownership and Interest in Mobile Apps to Monitor Symptoms of Mental Health Conditions: A Survey in Four Geographically Distinct Psychiatric Clinics , 2014, JMIR mental health.

[51]  Mike Conway,et al.  Social Media, Big Data, and Mental Health: Current Advances and Ethical Implications. , 2016, Current opinion in psychology.

[52]  Christy K. Scott,et al.  Managing Addiction as a Chronic Condition , 2007, Addiction science & clinical practice.

[53]  Erika B. Litvin,et al.  Computer and mobile technology-based interventions for substance use disorders: an organizing framework. , 2013, Addictive behaviors.

[54]  Hongfang Liu,et al.  Clinical documentation variations and NLP system portability: a case study in asthma birth cohorts across institutions , 2017, J. Am. Medical Informatics Assoc..

[55]  Alice H. Oh,et al.  Self-disclosure topic model for classifying and analyzing Twitter conversations , 2014, EMNLP.

[56]  Dhavan V. Shah,et al.  Implementing an mHealth system for substance use disorders in primary care: a mixed methods study of clinicians’ initial expectations and first year experiences , 2016, BMC Medical Informatics and Decision Making.

[57]  Vanessa Evers,et al.  Experts get me started, peers keep me going: comparing crowd- versus expert-designed motivational text messages for exercise behavior change , 2017, PervasiveHealth.

[58]  Christophe Giraud-Carrier,et al.  Validating Machine Learning Algorithms for Twitter Data Against Established Measures of Suicidality , 2016, JMIR mental health.

[59]  D. Gustafson,et al.  COMPARISONS: PROFESSIONALLY-DIRECTED AND SELF-DIRECTED INTERNET GROUPS FOR WOMEN WITH BREAST CANCER , 2005 .

[60]  Dan Cosley,et al.  Social Sharing of Emotions on Facebook: Channel Differences, Satisfaction, and Replies , 2015, CSCW.

[61]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[62]  Ming-Yuan Chih,et al.  Mobile Delivery of Treatment for Alcohol Use Disorders , 2014, Alcohol research : current reviews.

[63]  J. Piette,et al.  Diabetes Control With Reciprocal Peer Support Versus Nurse Care Management , 2010, Annals of Internal Medicine.

[64]  Nazli Goharian,et al.  Triaging content severity in online mental health forums , 2017, J. Assoc. Inf. Sci. Technol..

[65]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[66]  D. Jobes,et al.  Are There Linguistic Markers of Suicidal Writing That Can Predict the Course of Treatment? A Repeated Measures Longitudinal Analysis , 2016, Archives of suicide research : official journal of the International Academy for Suicide Research.

[67]  Gavin Andrews,et al.  Internet Treatment for Depression: A Randomized Controlled Trial Comparing Clinician vs. Technician Assistance , 2010, PloS one.

[68]  Kentaro Inui,et al.  Dependency Tree-based Sentiment Classification using CRFs with Hidden Variables , 2010, NAACL.

[69]  Alan D. Lopez,et al.  The Global Burden of Disease Study , 2003 .

[70]  J. Kelly,et al.  Recovery Management and the Future of Addiction Treatment and Recovery in the USA , 2010 .

[71]  D. Gustafson,et al.  A Pilot Test of a Mobile App for Drug Court Participants , 2016, Substance abuse : research and treatment.

[72]  Declan T. Barry,et al.  Computer-based interventions for drug use disorders: a systematic review. , 2011, Journal of substance abuse treatment.

[73]  D. Mohr,et al.  Integrating Human Support Into Behavioral Intervention Technologies: The Efficiency Model of Support , 2017 .

[74]  D. Mohr,et al.  Behavioral intervention technologies: evidence review and recommendations for future research in mental health. , 2013, General hospital psychiatry.

[75]  Hongfang Liu,et al.  Journal of Biomedical Informatics , 2022 .

[76]  Jina Huh,et al.  Text classification for assisting moderators in online health communities , 2013, J. Biomed. Informatics.

[77]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .