Toward automated assessment of health Web page quality using the DISCERN instrument

Background As the Internet becomes the number one destination for obtaining health-related information, there is an increasing need to identify health Web pages that convey an accurate and current view of medical knowledge. In response, the research community has created multicriteria instruments for reliably assessing online medical information quality. One such instrument is DISCERN, which measures health Web page quality by assessing an array of features. In order to scale up use of the instrument, there is interest in automating the quality evaluation process by building machine learning (ML)-based DISCERN Web page classifiers. Objective The paper addresses 2 key issues that are essential before constructing automated DISCERN classifiers: (1) generation of a robust DISCERN training corpus useful for training classification algorithms, and (2) assessment of the usefulness of the current DISCERN scoring schema as a metric for evaluating the performance of these algorithms. Methods Using DISCERN, 272 Web pages discussing treatment options in breast cancer, arthritis, and depression were evaluated and rated by trained coders. First, different consensus models were compared to obtain a robust aggregated rating among the coders, suitable for a DISCERN ML training corpus. Second, a new DISCERN scoring criterion was proposed (features-based score) as an ML performance metric that is more reflective of the score distribution across different DISCERN quality criteria. Results First, we found that a probabilistic consensus model applied to the DISCERN instrument was robust against noise (random ratings) and superior to other approaches for building a training corpus. Second, we found that the established DISCERN scoring schema (overall score) is ill-suited to measure ML performance for automated classifiers. Conclusion Use of a probabilistic consensus model is advantageous for building a training corpus for the DISCERN instrument, and use of a features-based score is an appropriate ML metric for automated DISCERN classifiers. Availability The code for the probabilistic consensus model is available at https://bitbucket.org/A_2/em_dawid/ .

[1]  W. Dutton,et al.  Next Generation Users: The Internet in Britain , 2011 .

[2]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[3]  David Hawking,et al.  Automated Assessment of the Quality of Depression Websites , 2005, Journal of medical Internet research.

[4]  Besiki Stvilia,et al.  A model for online consumer health information quality , 2009, J. Assoc. Inf. Sci. Technol..

[5]  Jeremy C Wyatt,et al.  Survey of Doctors' Experience of Patients Using the Internet , 2002, Journal of medical Internet research.

[6]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[7]  Manish Latthe,et al.  Accuracy of information on apparently credible websites: survey of five common health topics , 2002, BMJ : British Medical Journal.

[8]  Pietro Perona,et al.  Inferring Ground Truth from Subjective Labelling of Venus Images , 1994, NIPS.

[9]  Alejandro R Jadad,et al.  Examination of instruments used to rate quality of health information on the internet: chronicle of a voyage with an unclear destination , 2002, BMJ : British Medical Journal.

[10]  C. Rees,et al.  Evaluating the reliability of DISCERN: a tool for assessing the quality of written patient information on treatment choices. , 2002, Patient education and counseling.

[11]  Mouzhi Ge,et al.  A Review of Information Quality Research - Develop a Research Agenda , 2007, ICIQ.

[12]  Gerardo Hermosillo,et al.  Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.

[13]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[14]  Jeremy C Wyatt,et al.  Evaluation of ehealth systems and services , 2004, BMJ : British Medical Journal.

[15]  J. Powell,et al.  Empirical studies assessing the quality of health information for consumers on the world wide web: a systematic review. , 2002, JAMA.

[16]  C. Chronaki,et al.  European citizens' use of E-health services: A study of seven countries , 2007, BMC public health.

[17]  Pietro Perona,et al.  Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[18]  Gary L. Kreps,et al.  Trust and sources of health information: the impact of the Internet and its implications for health care providers: findings from the first Health Information National Trends Survey. , 2005, Archives of internal medicine.

[19]  C. Dube,et al.  Untangling the Web--the impact of Internet use on health care and the physician-patient relationship. , 2007, Patient education and counseling.

[20]  Yasser Khazaal,et al.  Brief DISCERN, six questions for the evaluation of evidence-based content of health-related websites. , 2009, Patient education and counseling.

[21]  S. A. Iverson,et al.  Impact of Internet Use on Health-Related Behaviors and the Patient-Physician Relationship: A Survey-Based Study and Review , 2008, The Journal of the American Osteopathic Association.

[22]  R. Moser,et al.  Surveys of physicians and electronic health information. , 2010, The New England journal of medicine.

[23]  Dirk Hovy,et al.  Learning Whom to Trust with MACE , 2013, NAACL.

[24]  Elizabeth Murray,et al.  The Impact of Health Information on the Internet on Health Care and the Physician-Patient Relationship: National U.S. Survey among 1.050 U.S. Physicians , 2003, Journal of medical Internet research.

[25]  D Charnock,et al.  DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. , 1999, Journal of epidemiology and community health.

[26]  Margaret M. Barry,et al.  A literature review on health information-seeking behaviour on the web: a health consumer and health , 2011 .

[27]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[28]  R. Kravitz,et al.  Health information on the Internet: accessibility, quality, and readability in English and Spanish. , 2001, JAMA.

[29]  Daniel Gatica-Perez,et al.  Inferring truth from multiple annotators for social interaction analysis , 2011, NIPS 2011.

[30]  Diane M. Strong,et al.  Information quality benchmarks: product and service performance , 2002, CACM.

[31]  Milos Hauskrecht,et al.  Learning classification models from multiple experts , 2013, J. Biomed. Informatics.

[32]  Elmer V. Bernstam,et al.  Instruments to assess the quality of health information on the World Wide Web: what can our patients actually use? , 2005, Int. J. Medical Informatics.

[33]  Yan Zhang,et al.  Quality of health information for consumers on the web: A systematic review of indicators, criteria, tools, and evaluation results , 2015, J. Assoc. Inf. Sci. Technol..