Patterns of markup use in Wikipedia

Wikipedia is a knowledge building community that lets anyone create and edit articles. While editing articles, users employ visual structure elements (VSE) to format content. VSEs are part of the Wikipedia markup language. All creation and editing events are recorded in a revision history. An unsupervised learning approach was used to analyze a dataset with more than 2,000,000 revisions of 126,000 articles. Using K-Means clustering and association rules mining a general classification of revisions was derived. Relevant classes include vandalism revisions, correction revisions and common revisions. Each class was later studied, and patterns of usage of markups elements identified. Those results help to identify the user intention, and the knowledge of VSE use could contribute to improving the actual text editors provide by Wikipedia to improve the editor's activity finally.

[1]  W. Marsden I and J , 2012 .

[2]  Martin Wattenberg,et al.  Studying cooperation and conflict between authors with history flow visualizations , 2004, CHI.

[3]  Gjergji Kasneci,et al.  YAWN: A Semantically Annotated Wikipedia XML Corpus , 2007, BTW.

[4]  Jonathan T. Morgan,et al.  The Rise and Decline of an Open Collaboration System , 2013 .

[5]  Ofer Arazy,et al.  The sustainability of corporate wikis: A time-series analysis of activity patterns , 2009, TMIS.

[6]  Aaas News,et al.  Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[7]  Olatz Arbelaitz,et al.  An extensive comparative study of cluster validity indices , 2013, Pattern Recognit..

[8]  Aaron Halfaker,et al.  Using edit sessions to measure participation in wikipedia , 2013, CSCW.

[9]  Cristina V. Lopes,et al.  Statistical measure of quality in Wikipedia , 2010, SOMA '10.

[10]  Aaron Halfaker,et al.  Who Did What: Editor Role Identification in Wikipedia , 2021, ICWSM.

[11]  Aaron Halfaker,et al.  Building Automated Vandalism Detection Tools for Wikidata , 2017, WWW.

[12]  Aaron Halfaker,et al.  Edit Categories and Editor Role Identification in Wikipedia , 2016, LREC.

[13]  Simone Paolo Ponzetto,et al.  Exploiting Semantic Role Labeling, WordNet and Wikipedia for Coreference Resolution , 2006, NAACL.

[14]  Oliver Ferschke,et al.  The quality of content in open online collaboration platforms: approaches to NLP-supported information quality management in Wikipedia , 2014 .

[15]  Bart Goethals,et al.  Automatic Vandalism Detection in Wikipedia : Towards a Machine Learning Approach , 2008 .

[16]  Ulrike Cress,et al.  Wiki-supported learning and knowledge building: effects of incongruity between knowledge and information , 2009, J. Comput. Assist. Learn..

[17]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[18]  Simone Paolo Ponzetto,et al.  WikiRelate! Computing Semantic Relatedness Using Wikipedia , 2006, AAAI.

[19]  Pierre Baldi,et al.  Statistical Measure of the Effectiveness of the Open Editing Model of Wikipedia , 2010 .

[20]  Nina Zumel,et al.  Practical Data Science with R , 2014 .

[21]  Deborah L. McGuinness,et al.  Mining Revision History to Assess Trustworthiness of Article Fragments , 2006, 2006 International Conference on Collaborative Computing: Networking, Applications and Worksharing.

[22]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[23]  Cristina V. Lopes,et al.  Modeling user reputation in wikis , 2010, Stat. Anal. Data Min..