Assessing the quality of health-related Wikipedia articles with generic and specific metrics

Wikipedia is an online, free, multi-language, and collaborative encyclopedia, currently one of the most significant information sources on the web. The open nature of Wikipedia contributions raises concerns about the quality of its information. Previous studies have addressed this issue using manual evaluations and proposing generic measures for quality assessment. In this work, we focus on the quality of health-related content. For this purpose, we use general and health-specific features from Wikipedia articles to propose health-specific metrics. We evaluate these metrics using a set of Wikipedia articles previously assessed by WikiProject Medicine. We conclude that it is possible to combine generic and specific metrics to determine health-related content’s information quality. These metrics are computed automatically and can be used by curators to identify quality issues. Along with the explored features, these metrics can also be used in approaches that automatically classify the quality of Wikipedia health-related articles.

[1]  R. Flesch A new readability yardstick. , 1948, The Journal of applied psychology.

[2]  Nir Lipsman,et al.  Readability and quality of wikipedia pages on neurosurgical topics , 2018, Clinical Neurology and Neurosurgery.

[3]  Ting Wang,et al.  Automatically Assessing Wikipedia Article Quality by Exploiting Article-Editor Networks , 2015, ECIR.

[5]  Linda C. Smith,et al.  INFORMATION QUALITY IN A COMMUNITY-BASED ENCYCLOPEDIA , 2005 .

[6]  Ali Kashif Bashir,et al.  Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) , 2013, ICIRA 2013.

[7]  R. P. Fishburne,et al.  Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel , 1975 .

[8]  G. Harry McLaughlin,et al.  SMOG Grading - A New Readability Formula. , 1969 .

[9]  Hua Zheng,et al.  Mining the Factors Affecting the Quality of Wikipedia Articles , 2010, 2010 International Conference of Information Science and Management Engineering.

[10]  Les Gasser,et al.  Assessing Information Quality of a Community-Based Encyclopedia , 2005, ICIQ.

[11]  Susan Gauch,et al.  Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web , 2000, SIGIR '00.

[12]  J. D. de Wolff,et al.  An Evaluation of Wikipedia as a Resource for Patient Education in Nephrology , 2013, Seminars in dialysis.

[13]  Aaron Halfaker,et al.  ORES: Lowering Barriers with Participatory Machine Learning in Wikipedia , 2020, Proc. ACM Hum. Comput. Interact..

[14]  Olivier Teste,et al.  Measuring article quality in Wikipedia using the collaboration network , 2015, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM).

[15]  Michaël,et al.  Seeking health information online: does Wikipedia matter? , 2009, Journal of the American Medical Informatics Association : JAMIA.

[16]  A. Dicker,et al.  Patient-oriented cancer information on the internet: a comparison of wikipedia and a professionally maintained database. , 2011, Journal of oncology practice.

[17]  Birk Diedenhofen,et al.  cocor: A Comprehensive Solution for the Statistical Comparison of Correlations , 2015, PloS one.

[18]  Carla Teixeira Lopes,et al.  Readability of web content , 2019, 2019 14th Iberian Conference on Information Systems and Technologies (CISTI).

[19]  K. Haerling,et al.  Making Sense of Methods and Measurement: Spearman-Rho Ranked-Order Correlation Coefficient , 2014 .

[20]  Krai Meemon,et al.  The Quality and Readability of English Wikipedia Anatomy Articles , 2020, Anatomical sciences education.

[21]  Angelo Spognardi,et al.  Maturity Assessment of Wikipedia Medical Articles , 2014, 2014 IEEE 27th International Symposium on Computer-Based Medical Systems.

[22]  Yutaka Matsuo,et al.  An Edit-centric Approach for Wikipedia Article Quality Assessment , 2019, EMNLP.

[23]  M. Coleman,et al.  A computer readability formula designed for machine scoring. , 1975 .

[24]  Antti Karjalainen,et al.  Book Reviews : International Statistical Classification of Diseases and Related Health Problems 10th Revision, Vol 2. Instruction Manual. by World Health Organisation, 1993. 160 pp, Sw fr 40. Hardback. ISBN: 92-4-154420-1 , 1994 .

[25]  Carla Teixeira Lopes,et al.  Characterizing and comparing Portuguese and English Wikipedia medicine-related articles , 2019, WWW.