Developing a Process for the Analysis of User Journeys and the Prediction of Dropout in Digital Health Interventions: Machine Learning Approach

Background User dropout is a widespread concern in the delivery and evaluation of digital (ie, web and mobile apps) health interventions. Researchers have yet to fully realize the potential of the large amount of data generated by these technology-based programs. Of particular interest is the ability to predict who will drop out of an intervention. This may be possible through the analysis of user journey data—self-reported as well as system-generated data—produced by the path (or journey) an individual takes to navigate through a digital health intervention. Objective The purpose of this study is to provide a step-by-step process for the analysis of user journey data and eventually to predict dropout in the context of digital health interventions. The process is applied to data from an internet-based intervention for insomnia as a way to illustrate its use. The completion of the program is contingent upon completing 7 sequential cores, which include an initial tutorial core. Dropout is defined as not completing the seventh core. Methods Steps of user journey analysis, including data transformation, feature engineering, and statistical model analysis and evaluation, are presented. Dropouts were predicted based on data from 151 participants from a fully automated web-based program (Sleep Healthy Using the Internet) that delivers cognitive behavioral therapy for insomnia. Logistic regression with L1 and L2 regularization, support vector machines, and boosted decision trees were used and evaluated based on their predictive performance. Relevant features from the data are reported that predict user dropout. Results Accuracy of predicting dropout (area under the curve [AUC] values) varied depending on the program core and the machine learning technique. After model evaluation, boosted decision trees achieved AUC values ranging between 0.6 and 0.9. Additional handcrafted features, including time to complete certain steps of the intervention, time to get out of bed, and days since the last interaction with the system, contributed to the prediction performance. Conclusions The results support the feasibility and potential of analyzing user journey data to predict dropout. Theory-driven handcrafted features increased the prediction performance. The ability to predict dropout at an individual level could be used to enhance decision making for researchers and clinicians as well as inform dynamic intervention regimens.

[1]  Gustavo E. A. P. A. Batista,et al.  An analysis of four missing data treatment methods for supervised learning , 2003, Appl. Artif. Intell..

[2]  G. Andersson,et al.  Multimedia Appendix 1 , 2011 .

[3]  Corine H. G. Horsch,et al.  UvA-DARE ( Digital Academic Repository ) Adherence to technology-mediated insomnia treatment : a meta-analysis , interviews , and focus groups , 2017 .

[4]  Sotiris B. Kotsiantis,et al.  Supervised Machine Learning: A Review of Classification Techniques , 2007, Informatica.

[5]  Arun Sen,et al.  Current trends in web data analysis , 2006, CACM.

[6]  Oznur Alkan,et al.  One button machine for automating feature engineering in relational databases , 2017, ArXiv.

[7]  G. Eysenbach The Law of Attrition , 2005, Journal of medical Internet research.

[8]  C. Vandelanotte,et al.  Website-delivered physical activity interventions a review of the literature. , 2007, American journal of preventive medicine.

[9]  Lee M Ritterband,et al.  Efficacy of an Internet-based behavioral intervention for adults with insomnia. , 2009, Archives of general psychiatry.

[10]  Michael Krausz,et al.  Online interventions for depression and anxiety – a systematic review , 2014, Health psychology and behavioral medicine.

[11]  P. Chatterjee,et al.  Modeling the Clickstream: Implications for Web-Based Advertising Efforts , 2003 .

[12]  H. Riper,et al.  Predicting Therapy Success and Costs for Personalized Treatment Recommendations Using Baseline Characteristics: Data-Driven Analysis , 2018, Journal of medical Internet research.

[13]  Mark Hoogendoorn,et al.  Predicting therapy success for treatment as usual and blended treatment in the domain of depression , 2017, Internet interventions.

[14]  Daniel J Buysse,et al.  The consensus sleep diary: standardizing prospective sleep self-monitoring. , 2012, Sleep.

[15]  G. Andersson,et al.  Internet-based vs. face-to-face cognitive behavior therapy for psychiatric and somatic disorders: an updated systematic review and meta-analysis , 2018, Cognitive behaviour therapy.

[16]  Burkhardt Funk,et al.  How Much Tracking Is Necessary? - The Learning Curve in Bayesian User Journey Analysis , 2015, ECIS.

[17]  Akane Sano,et al.  Predicting Tomorrow's Mood, Health, and Stress Level using Personalized Multitask Learning and Domain Adaptation , 2017, AffComp@IJCAI.

[18]  John Torous,et al.  Dropout rates in clinical trials of smartphone apps for depressive symptoms: A systematic review and meta-analysis. , 2019, Journal of affective disorders.

[19]  Mark Hoogendoorn,et al.  A feature representation learning method for temporal datasets , 2016, 2016 IEEE Symposium Series on Computational Intelligence (SSCI).

[20]  Boris P. Kovatchev,et al.  A Behavior Change Model for Internet Interventions , 2009, Annals of behavioral medicine : a publication of the Society of Behavioral Medicine.

[21]  Bruce Neal,et al.  A Systematic Review of the Impact of Adherence on the Effectiveness of e-Therapies , 2011, Journal of medical Internet research.

[22]  C. Botella,et al.  Dropping out of a transdiagnostic online intervention: A qualitative analysis of client's experiences , 2017, Internet interventions.

[23]  Masumi Iida,et al.  Using diary methods in psychological research. , 2012 .

[24]  Wendy F. Cohn,et al.  Effect of a Web-Based Cognitive Behavior Therapy for Insomnia Intervention With 1-Year Follow-up: A Randomized Clinical Trial , 2017, JAMA psychiatry.

[25]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[26]  E. Wickwire,et al.  The Value of Digital Insomnia Therapeutics: What We Know and What We Need To Know. , 2019, Journal of clinical sleep medicine : JCSM : official publication of the American Academy of Sleep Medicine.

[27]  Kalyan Veeramachaneni,et al.  Deep feature synthesis: Towards automating data science endeavors , 2015, 2015 IEEE International Conference on Data Science and Advanced Analytics (DSAA).

[28]  A. H. Marcus,et al.  Some useful statistical methods for model validation. , 1998, Environmental health perspectives.

[29]  Mark Hoogendoorn,et al.  Exploring and Comparing Machine Learning Approaches for Predicting Mood Over Time , 2016 .

[30]  Burkhardt Funk,et al.  How to Predict Mood? Delving into Features of Smartphone-Based Data , 2016, AMCIS.

[31]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[32]  James Zijun Wang,et al.  RAPID: Rating Pictorial Aesthetics using Deep Learning , 2014, ACM Multimedia.

[33]  Lee M Ritterband,et al.  Effectiveness of an online insomnia program (SHUTi) for prevention of depressive episodes (the GoodNight Study): a randomised controlled trial. , 2016, The lancet. Psychiatry.

[34]  Florian Nottorf,et al.  The User-journey in Online Search - An Empirical Study of the Generic-to-Branded Spillover Effect based on User-level Data , 2012, DCNET/ICE-B/OPTICS.

[35]  Lee M Ritterband,et al.  Development and Perceived Utility and Impact of an Internet Intervention for Insomnia. , 2008, E-journal of applied psychology : clinical and social issues.

[36]  Scott Lundberg,et al.  A Unified Approach to Interpreting Model Predictions , 2017, NIPS.

[37]  Elizabeth Murray,et al.  Evaluating Digital Health Interventions: Key Questions and Approaches. , 2016, American journal of preventive medicine.

[38]  Luigi Salmaso,et al.  Model performance analysis and model validation in logistic regression , 2007 .

[39]  Leanne M. Casey,et al.  Dropout from Internet-based treatment for psychological disorders. , 2010, The British journal of clinical psychology.

[40]  Udayan Khurana,et al.  Automating Feature Engineering , 2016 .

[41]  Heleen Riper,et al.  Blending Face-to-Face and Internet-Based Interventions for the Treatment of Mental Disorders in Adults: Systematic Review , 2017, Journal of medical Internet research.

[42]  Elizabeth Murray,et al.  The Effectiveness of Technology-Based Strategies to Promote Engagement With Digital Interventions: A Systematic Review Protocol , 2014, JMIR research protocols.

[43]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[44]  M. Hyland,et al.  Attrition from self-directed interventions: investigating the relationship between psychological predictors, intervention content and dropout from a body dissatisfaction intervention. , 2010, Social science & medicine.

[45]  Alan Bauck,et al.  Associations of Internet Website Use With Weight Change in a Long-term Weight Loss Maintenance Program , 2010, Journal of medical Internet research.

[46]  Helen Christensen,et al.  The GoodNight study—online CBT for insomnia for the indicated prevention of depression: study protocol for a randomised controlled trial , 2014, Trials.

[47]  Marjan Mansourvar,et al.  Predicting Dropouts From an Electronic Health Platform for Lifestyle Interventions: Analysis of Methods and Predictors , 2019, Journal of medical Internet research.

[48]  Gjergji Kasneci,et al.  Automated feature generation from structured knowledge , 2011, CIKM '11.