论文信息 - FlawFinder: A Modular System for Predicting Quality Flaws in Wikipedia

FlawFinder: A Modular System for Predicting Quality Flaws in Wikipedia

With over 23 million articles in 285 languages, Wikipedia is the largest free knowledge base on the web. Due to its open nature, everybody is allowed to access and edit the contents of this huge encyclopedia. As a downside of this open access policy, quality assessment of the content becomes a critical issue and is hardly manageable without computational assistance. In this paper, we present FlawFinder, a modular system for automatically predicting quality flaws in unseen Wikipedia articles. It competed in the inaugural edition of the Quality Flaw Prediction Task at the PAN Challenge 2012 and achieved the best precision of all systems and the second place in terms of recall and F1-score.

Oliver Ferschke | Iryna Gurevych | Marc Rittberger

[1] Benno Stein,et al. Predicting quality flaws in user-generated content: the case of wikipedia , 2012, SIGIR '12.

[2] Iryna Gurevych,et al. A lightweight framework for reproducible parameter sweeping in information retrieval , 2011, DESIRE '11.

[3] Dawei Jiang,et al. Probabilistic Quality Assessment Based on Article's Revision History , 2011, DEXA.

[4] Rada Mihalcea,et al. Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Langu , 2011, ACL 2011.

[5] Benno Stein,et al. On the Evolution of Quality Flaws and the Effectiveness of Cleanup Tags in the English Wikipedia , 2012 .

[6] Linda C. Smith,et al. A framework for information quality assessment , 2007 .

[7] David A. Ferrucci,et al. UIMA: an architectural approach to unstructured information processing in the corporate research environment , 2004, Natural Language Engineering.

[8] Iryna Gurevych,et al. Extracting Lexical Semantic Knowledge from Wikipedia and Wiktionary , 2008, LREC.

[9] Ke-Jia Chen,et al. Web Article Quality Assessment in Multi-dimensional Space , 2011, WAIM.

[10] Pável Calado,et al. Automatic quality assessment of content created collaboratively by web communities: a case study of wikipedia , 2009, JCDL '09.

[11] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.