Automatically Assessing Quality of Online Health Articles

Today information in the world wide web is overwhelmed by unprecedented quantity of data on versatile topics with varied quality. However, the quality of information disseminated in the field of medicine has been questioned as the negative health consequences of health misinformation can be life-threatening.There is currently no generic automated tool for evaluating the quality of online health information spanned over broad range. To address this gap, in this paper, we applied data mining approach to automatically assess the quality of online health articles based on 10 quality criteria. We have prepared a labelled dataset with 53012 features and applied different feature selection methods to identify the best feature subset with which our trained classifier achieved an accuracy of 84%-90% varied over 10 criteria. Our semantic analysis of features shows the underpinning associations between the selected features and assessment criteria and further rationalize our assessment approach. Our findings will help in identifying high quality health articles and thus aiding users in shaping their opinion to make right choice while picking health related help from online.

[1]  L. Franck,et al.  Ensuring Quality Information for Patients: development and preliminary validation of a new instrument to improve the quality of written health care information , 2004, Health expectations : an international journal of public participation in health care and health policy.

[2]  Natali Ruchansky,et al.  Combating Fake News: A Survey on Identification and Mitigation Techniques , 2019, ArXiv.

[3]  M. Breckons,et al.  What Do Evaluation Instruments Tell Us About the Quality of Complementary Medicine Information on the Internet? , 2007, Journal of medical Internet research.

[4]  J. Rowley,et al.  Trust and Credibility in Web-Based Health Information: A Review and Agenda for Future Research , 2017, Journal of medical Internet research.

[5]  Lorraine Roberts Health information and the Internet: The 5 Cs website evaluation tool. , 2010, British journal of nursing.

[6]  Isabelle Boutron,et al.  Misrepresentation of Randomized Controlled Trials in Press Releases and News Coverage: A Cohort Study , 2012, PLoS medicine.

[7]  Ameen Abu-Hanna,et al.  Web-site Evaluation Tools: A Case Study in Reproductive Health Information , 2014, MIE.

[8]  Hwanjo Yu,et al.  SVM Tutorial - Classification, Regression and Ranking , 2012, Handbook of Natural Computing.

[9]  J. Hirsh,et al.  The development and validation of an instrument to measure the quality of health research reports in the lay media , 2017, BMC Public Health.

[10]  Kelly Stahl Fake news detection in social media , 2018 .

[11]  Paul Kim,et al.  Published criteria for evaluating health related web sites: review , 1999, BMJ.

[12]  Xujuan Zhou,et al.  Using social connection information to improve opinion mining: Identifying negative sentiment about HPV vaccines on Twitter , 2015, MedInfo.

[13]  Sushilkumar Rameshpant Kalmegh,et al.  Comparative Analysis of WEKA Data Mining Algorithm RandomForest, RandomTree and LADTree for Classification of Indigenous News Data , 2015 .

[14]  Christian Köhler,et al.  How do consumers search for and appraise health information on the world wide web? Qualitative study using focus groups, usability tests, and in-depth interviews , 2002, BMJ : British Medical Journal.

[15]  Mukesh Kumar,et al.  A Meta Search Approach to Find Similarity between Web Pages Using Different Similarity Measures , 2011 .

[16]  D Charnock,et al.  DISCERN: an instrument for judging the quality of written consumer health information on treatment choices. , 1999, Journal of epidemiology and community health.

[17]  Qiaozhu Mei,et al.  Creating a Labeled Dataset for Medical Misinformation in Health Forums , 2017, 2017 IEEE International Conference on Healthcare Informatics (ICHI).

[18]  L. Siu,et al.  Impact of the media and the internet on oncology: survey of cancer patients and oncologists in Canada. , 2001, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[19]  Shlok Gilda,et al.  Evaluating machine learning algorithms for fake news detection , 2017, 2017 IEEE 15th Student Conference on Research and Development (SCOReD).

[20]  Elmer V. Bernstam,et al.  Instruments to assess the quality of health information on the World Wide Web: what can our patients actually use? , 2005, Int. J. Medical Informatics.

[21]  Julie M. Robillard,et al.  The QUEST for quality online health information: validation of a short quantitative tool , 2018, BMC Medical Informatics and Decision Making.

[22]  Hamman Samuel,et al.  MedFact: Towards Improving Veracity of Medical Information in Social Media Using Applied Machine Learning , 2018, Canadian Conference on AI.

[23]  Zuzana Kominkova Oplatkova,et al.  Model for Assessing Quality of Online Health Information: A Fuzzy VIKOR Based Method , 2016 .

[24]  A D Oxman,et al.  An index of scientific quality for health reports in the lay press. , 1993, Journal of clinical epidemiology.

[25]  Brent Kitchens,et al.  Quality of health-related online search results , 2014, Decis. Support Syst..

[26]  S B Soumerai,et al.  Coverage by the news media of the benefits and risks of medications. , 2000, The New England journal of medicine.

[27]  James Fairbanks,et al.  Credibility Assessment in the News : Do we need to read ? , 2018 .

[28]  Yelena Mejova,et al.  Catching Zika Fever: Application of Crowdsourcing and Machine Learning for Tracking Health Misinformation on Twitter , 2017, 2017 IEEE International Conference on Healthcare Informatics (ICHI).

[29]  G D Lundberg,et al.  Assessing, controlling, and assuring the quality of medical information on the Internet: Caveant lector et viewor--Let the reader and viewer beware. , 1997, JAMA.

[30]  Balasubramanian Raman,et al.  Combining Neural, Statistical and External Features for Fake News Stance Identification , 2018, WWW.

[31]  D. Henry,et al.  Monitoring the quality of medical news reporting: early experience with media doctor , 2005, The Medical journal of Australia.

[32]  Ian H. Witten,et al.  Data Mining, Fourth Edition: Practical Machine Learning Tools and Techniques , 2016 .

[33]  Sandra Williams Hilfiker,et al.  Making Quality Health Websites a National Public Health Priority: Toward Quality Standards , 2016, Journal of medical Internet research.

[34]  Xiaodong Zeng,et al.  Constructing Better Classifier Ensemble Based on Weighted Accuracy and Diversity Measure , 2014, TheScientificWorldJournal.

[35]  Pietro Ghezzi,et al.  A Methodology to Analyze the Quality of Health Information on the Internet , 2015, The Diabetes educator.

[36]  Jingcheng Du,et al.  Optimization on machine learning based approaches for sentiment analysis on HPV vaccines related tweets , 2017, Journal of Biomedical Semantics.

[37]  Suhang Wang,et al.  Fake News Detection on Social Media: A Data Mining Perspective , 2017, SKDD.

[38]  Raina M. Merchant,et al.  Protecting the Value of Medical Science in the Age of Social Media and “Fake News” , 2018, JAMA.

[39]  Alla Keselman,et al.  Evaluating the Quality of Health Information in a Changing Digital Ecosystem , 2019, Journal of medical Internet research.

[40]  Muhammad Ashad Kabir,et al.  Differences in Health News from Reliable and Unreliable Media , 2019, WWW.

[41]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[42]  Harry Zhang,et al.  Naive Bayesian Classifiers for Ranking , 2004, ECML.

[43]  Wen-Ying Sylvia Chou,et al.  Addressing Health-Related Misinformation on Social Media. , 2018, JAMA.

[44]  Steven Woloshin,et al.  Influence of medical journal press releases on the quality of associated newspaper coverage: retrospective cohort study , 2012, BMJ : British Medical Journal.

[45]  Khandaker Tasnim Huq,et al.  Comparative Study of Feature Engineering Techniques for Disease Prediction , 2018, BDCA.

[46]  Jonathan Green,et al.  Emerging challenges in using health information from the internet , 2003 .

[47]  Scott Counts,et al.  Understanding Anti-Vaccination Attitudes in Social Media , 2016, ICWSM.

[48]  Alice R Kininmonth,et al.  Quality assessment of nutrition coverage in the media: a 6-week survey of five popular UK newspapers , 2017, BMJ Open.

[49]  Gbogboade Ademiluyi,et al.  Evaluating the reliability and validity of three tools to assess the quality of health information on the Internet. , 2003, Patient education and counseling.

[50]  C Boyer,et al.  The Health On the Net Code of Conduct for medical and health Websites , 1998, Comput. Biol. Medicine.

[51]  Issa Traoré,et al.  Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques , 2017, ISDDC.

[52]  Hannes Wahlroos,et al.  The DARTS tool for assessing online medicines information , 2008, Pharmacy World & Science.

[53]  Yelena Mejova,et al.  Fake Cures: User-centric Modeling of Health Misinformation in Social Media , 2018 .

[54]  J. Powell,et al.  Empirical studies assessing the quality of health information for consumers on the world wide web: a systematic review. , 2002, JAMA.

[55]  Sasha Shepperd,et al.  A 5-star system for rating the quality of information based on DISCERN. , 2002, Health information and libraries journal.

[56]  Balasubramanian Raman,et al.  On the Benefit of Combining Neural, Statistical and External Features for Fake News Identification , 2017, ArXiv.

[57]  Alejandro R Jadad,et al.  Examination of instruments used to rate quality of health information on the internet: chronicle of a voyage with an unclear destination , 2002, BMJ : British Medical Journal.