Interrater Reliability of mHealth App Rating Measures: Analysis of Top Depression and Smoking Cessation Apps

Background There are over 165,000 mHealth apps currently available to patients, but few have undergone an external quality review. Furthermore, no standardized review method exists, and little has been done to examine the consistency of the evaluation systems themselves. Objective We sought to determine which measures for evaluating the quality of mHealth apps have the greatest interrater reliability. Methods We identified 22 measures for evaluating the quality of apps from the literature. A panel of 6 reviewers reviewed the top 10 depression apps and 10 smoking cessation apps from the Apple iTunes App Store on these measures. Krippendorff’s alpha was calculated for each of the measures and reported by app category and in aggregate. Results The measure for interactiveness and feedback was found to have the greatest overall interrater reliability (alpha=.69). Presence of password protection (alpha=.65), whether the app was uploaded by a health care agency (alpha=.63), the number of consumer ratings (alpha=.59), and several other measures had moderate interrater reliability (alphas>.5). There was the least agreement over whether apps had errors or performance issues (alpha=.15), stated advertising policies (alpha=.16), and were easy to use (alpha=.18). There were substantial differences in the interrater reliabilities of a number of measures when they were applied to depression versus smoking apps. Conclusions We found wide variation in the interrater reliability of measures used to evaluate apps, and some measures are more robust across categories of apps than others. The measures with the highest degree of interrater reliability tended to be those that involved the least rater discretion. Clinical quality measures such as effectiveness, ease of use, and performance had relatively poor interrater reliability. Subsequent research is needed to determine consistent means for evaluating the performance of apps. Patients and clinicians should consider conducting their own assessments of apps, in conjunction with evaluating information from reviews.

[1]  John Torous,et al.  Towards a Framework for Evaluating Mobile Mental Health Apps. , 2015, Telemedicine journal and e-health : the official journal of the American Telemedicine Association.

[2]  E. Chiauzzi,et al.  ‘Trust but verify’ – five approaches to ensure safe medical apps , 2015, BMC Medicine.

[3]  John Torous,et al.  Current research and trends in the use of smartphone applications for mood disorders , 2015 .

[4]  Susie Skarl,et al.  Anxiety and Depression Association of America , 2015 .

[5]  Oksana Zelenko,et al.  Mobile App Rating Scale: A New Tool for Assessing the Quality of Health Mobile Apps , 2015, JMIR mHealth and uHealth.

[6]  Timothy J. Vogus,et al.  National hospital ratings systems share few common scores and may generate confusion instead of clarity. , 2015, Health affairs.

[7]  J. L. Bender,et al.  Finding a Depression App: A Review and Content Analysis of the Depression App Marketplace , 2015, JMIR mHealth and uHealth.

[8]  Lyndal Trevena,et al.  A systematic review of quality assessment methods for smartphone health apps. , 2015, Telemedicine journal and e-health : the official journal of the American Telemedicine Association.

[9]  Alexander Staudt,et al.  KRIPPALPHA: Stata module to compute Krippendorff's alpha intercoder reliability coefficient , 2015 .

[10]  Giorgio Ferriero,et al.  Mobile smartphone applications for body position measurement in rehabilitation: a review of goniometric tools. , 2014, PM & R : the journal of injury, function, and rehabilitation.

[11]  Molly E Waring,et al.  Evaluating and selecting mobile health apps: strategies for healthcare providers and healthcare organizations , 2014, Translational behavioral medicine.

[12]  D. Bates,et al.  Certification of mobile apps for health care--reply. , 2014, JAMA.

[13]  John Torous,et al.  Promise and perils of digital psychiatry. , 2014, Asian journal of psychiatry.

[14]  D. Bates,et al.  In search of a few good apps. , 2014, JAMA.

[15]  Jounghwa Choi,et al.  Smoking Cessation Apps for Smartphones: Content Analysis With the Self-Determination Theory , 2014, Journal of medical Internet research.

[16]  Claes Andersson,et al.  Mobile phone brief intervention applications for risky alcohol use among university students: a randomized controlled study , 2013, Addiction Science & Clinical Practice.

[17]  An Ihi Resource A Framework for Selecting Digital Health Technology , 2014 .

[18]  Lorien C Abroms,et al.  A content analysis of popular smartphone apps for smoking cessation. , 2013, American journal of preventive medicine.

[19]  H. Christensen,et al.  Smartphones for Smarter Delivery of Mental Health Programs: A Systematic Review , 2013, Journal of medical Internet research.

[20]  Isabel de la Torre Díez,et al.  Development and Evaluation of Tools for Measuring the Quality of Experience (QoE) in mHealth Applications , 2013, Journal of Medical Systems.

[21]  T. L. Lewis,et al.  A Systematic Self-Certification Model for Mobile Medical Apps , 2013, Journal of medical Internet research.

[22]  Lindsey E. Dayer,et al.  Smartphone medication adherence apps: potential benefits to patients and providers. , 2013, Journal of the American Pharmacists Association : JAPhA.

[23]  S. Heldenbrand,et al.  Potential Benefits to Patients and Providers , 2013 .

[24]  Ambarish Pandey,et al.  Smartphone Apps as a Source of Cancer Information: Changing Trends in Health Information-Seeking Behavior , 2012, Journal of Cancer Education.

[25]  Hadi Kharrazi,et al.  Mobile personal health records: An evaluation of features and functionality , 2012, Int. J. Medical Informatics.

[26]  Philipp Schaer,et al.  Better than Their Reputation? On the Reliability of Relevance Assessments with Students , 2012, CLEF.

[27]  J. West,et al.  There ’ s an App for That : Content Analysis of Paid Health and Fitness Apps , 2018 .

[28]  Kimberly A. Neuendorf,et al.  Reliability for Content Analysis , 2010 .

[29]  Ron Artstein,et al.  Survey Article: Inter-Coder Agreement for Computational Linguistics , 2008, CL.

[30]  K. Krippendorff Reliability in Content Analysis: Some Common Misconceptions and Recommendations , 2004 .

[31]  D. Berwick About the Institute for Healthcare Improvement , 1993 .

[32]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .